Harris 


Discourse  Analysis 
Reprints 


THE  LIBRARY 

OF 

THE  UNIVERSITY 

OF  CALIFORNIA 

LOS  ANGELES 


ZELLIG  S.  HARRIS 


!0^iSSa,ia,?S»^a?a?03?^^^S?a,?^^2?^^^S!^^S!^3!fi?S!S!S!^0S!a!S 


Discourse  Analysis 


Reprints 


^^^m^^^m^^^^m^^^^^B^^^^^^^^^^m^^^B^^^^^^^ 


MOUTON&CO. 


PAPERS  ON  FORMAL  LINGUISTICS  is  a  series  of  monographs  edited 
by  the  Department  of  Linguistics  of  the  University  of  Pennsylvania  in  co- 
operation with  the  National  Science  Foundation  Project  in  Linguistic  Trans- 
formations. The  editor  for  the  Department  is  Henry  Hiz.  All  correspondence 
should  be  addressed  to  the  Department  of  Linguistics,  University  of  Penn- 
sylvania, Philadelphia  4,  Pennsylvania. 


PAPERS  ON  FORMAL  LINGUISTICS 

No.  2 


DISCOURSE 

/ 

ANALYSIS  REPRINTS 


by 
ZELLIG  S.  HARRIS 


UNIVERSITY  OF  PENNSYLVANIA 


1963 

MOUTON  &  CO.  •  THE  HAGUE 


©  Copyright  1963  Mouton  &  Co.,  Publishers,  The  Hague, 
The  Netherlands. 

No  part  of  this  hook  may  be  translated  or  reproduced  in  any  form, 

by  print,  photoprint,  microfilm,  or  any  other  means,  without  written 

permission  from  the  publishers. 


These  papers  are  reissued  from  early  numbers  of  the  Transformations  and  Dis- 
course Analysis  Papers  (TDAP),  the  mimeographed  series  of  the  Department  of 
Linguistics,  University  of  Pennsylvania,  and  of  the  National  Science  Founda- 
tion project  in  linguistic  transformations.  Section  I  is  reissued  from  TDAP  4a 
(1957),  Section  II  from  Canonical  Form  of  a  Text,  TDAP  3b  (1957),  Section  III 
from  TDAP  3c  (1957). 

These  papers  do  not  represent  the  current  state  of  work  in  discourse  analysis. 
Sections  II  and  III  show  two  stages  of  analysis  of  a  scientific  article:  in  II,  the 
sentences  of  the  article  are  transformed  to  a  normal  form;  in  III,  this  transform 
of  the  article  is  reduced  to  a  summary  form.  The  Appendix  is  a  reprint  of  a 
somewhat  difTcrcnt  and  still  earlier  form  of  carrying  out  discourse  analysis. 

Printed  in  The  Netherlands  by  Mouton  &  Co.,  Printers,  The  Hague 


CONTENTS 


1.  Discourse  Analysis  Manual 7 

1.  Introduction 7 

2.  Equivalence  classes 8 

3.  Ad  hoc  equivalences 9 

4.  Transformations 11 

5.  Procedure 16 

6.  Interpretation 19 

II.  Discourse  Analysis  of  a  Technical  Article: 

Normalization  of  a  Text 20 

III.  Discourse  Analysis  of  a  Technical  Article: 

Reduction  to  Tables          42 

0.  Summary 42 

1.  Table  of  optimal  periods 43 

2.  Compact  form  of  the  table  of  periods        ...  50 

3.  Reduction  of  the  compact  table 51 

4.  Discussion 56 


6  contents 

Appendix  :  Discourse  Analysis  of  a  Story     ....  57 

1.  Grammatical  transformations 60 

2.  Discourse  operations 63 

3.  Application  to  the  text 66 

4.  Structure  of  the  text 71 


I.  DISCOURSE  ANALYSIS  MANUAL 


1.  INTRODUCTION 

Discourse  analysis  is  a  method  of  seeking  in  any  connected  discrete 
linear  material,  whether  language  or  language-like,  which  contains 
more  than  one  elementary  sentence,  some  global  structure  char- 
acterizing the  whole  discourse  (the  hnear  material),  or  large  sections 
of  it.  The  structure  is  a  pattern  of  occurrence  (i.e.  a  recurrence)  of 
segments  of  the  discourse  relative  to  each  other;  such  relative 
occurrence  of  parts  is  the  only  type  of  structure  that  can  be  in- 
vestigated by  inspection  of  the  discourse  without  bringing  into 
account  other  types  of  data,  such  as  relations  of  meanings  through- 
out the  discourse.  It  turns  out  that  the  segments  of  a  discourse  which 
occur  in  a  regular  way  relative  to  each  other  are  not  whole  sentences 
but  morpheme  sequences  such  as  words,  parts  of  words,  and  phra- 
ses, or  the  equivalent  in  mathematics  and  other  non-language 
material.  More  exactly,  such  a  segment  is  a  whole  consituent  or  a 
sequence  of  constituents;  where  a  constituent,  for  language,  is  a 
segment  of  a  sentence  resulting  from  any  grammatical  analysis  of 
the  sentence.  These  segments  do  not  themselves  occur  so  often  and 
so  regularly  as  to  constitute  a  pattern.  Therfore  we  group  certain  of 
them  into  classes,  which  do  recur  regularly.  Discourse  analysis, 
then,  finds  the  recurrence  relative  to  each  other  of  classes  of  mor- 
pheme sequences,  given  a  segmentation  into  morpheme  sequences 
by  a  suitable  grammar,  and  having  the  intention  that  the  classes 
set  up  are  such  that  their  regularity  of  occurrence  will  correspond 
to  some  relevant  semantic  interpretation  for  the  discourse.    The 


DISCOURSE  ANALYSIS  MANUAL 


problem  is  to  set  up  separately  for  each  discourse  such  classes  as 
have  the  greatest  relevant  regularity  of  occurrence  relative  to  each 
other  within  it;  and  if  possible  to  find  a  general  way  of  solving  this 
problem  for  any  discourse. 


2.  EQUIVALENCE  CLASSES 

The  basic  operation  is  the  forming  of  the  classes  of  morpheme 
sequences.  These  classes  are  formed  by  an  equivalence  relation 
(transitive  except  for  subscript,  symmetric,  reflexive)  on  morpheme 
sequences,  recursively  defined  as  follows : 

a  =0  b  .=.  a  is  the  same  morpheme  sequence  as  b^ 

a  ^n  b  .^.  env  a  :=n-i  env  b 
where  a,b,  ...  are  morpheme  sequences,  and  env  a  is  the  remainder 
of  the  sentence  which  contains  a\  that  is,  env  a  is  the  sentential 
environment  or  complement  of  <7,  and  is  itself  a  (possibly  broken) 
sequence  of  morphemes,  {env  a  =71-1  env  b  is  taken  to  mean  that 
at  least  some  part  of  env  a  =n-i  the  corresponding  part  of  env  b, 
and  that  any  other  parts  of  e«v  a  =  m  <  n-i  the  corresponding  parts 
of  env  b.  That  is,  n-1  is  the  highest  subscript  of  equivalence  between 
any  part  of  env  a  and  the  corresponding  part  of  env  b.) 

The  equivalence  a  =0  bis  used  only  when  we  can  find  a  chain  of 
equivalence  with  ascending  subscripts.  To  find  such  an  ascending 
chain,  it  is  usually  necessary  that  a  and  b  occur  in  corresponding 
grammatical  positions  within  their  respective  sentences,  or  within 
the  transforms  of  their  respective  sentences  (see  section  4  below). 
Ubiquitous  words  like  the,  in,  is  will  usually  not  satisfy  this  condi- 

^  After  the  equivalence  classes  have  been  set  up  and  their  relative  occurrence 
studied  Csee  5  below),  we  may  find  reason  to  say  that  in  a  particular  case  a  i=oa: 
i.e.  to  say  that  a  particular  occurrence  of  the  morpheme  sequence  a  is  not  dis- 
course-equivalent to  the  other  occurrences  of  the  morpheme  sequence  a.  This 
will  happen  if  we  find  that  accepting  the  equivalence  in  this  particular  case 
forces  us  to  equate  two  equivalence  classes  whose  dilTerence  of  distribution  in 
the  double  array  has  reasonable  interpretation.  Such  situations  are  rare;  and 
in  any  case  the  equivalence  chain  has  to  start  with  the  hypothesis  that  occur- 
rences of  the  same  morpheme  sequence  are  equivalent  to  each  other  in  degree 
zero. 


DISCOURSE  ANALYSIS  MANUAL  V 

lion,  and  are  therefore  useless  as  a  base  for  a  chain  of  equivalences. 

The  equivalence  c  —n  c/ between  particular  morpheme  sequences 
may  be  reached  by  more  than  one  chain.  The  degree  n  of  the 
equivalence  between  them  will  be  understood  to  be  the  lowest 
subscript  o{ c  =  d'va  any  chain  in  which  c  =^  d appears. 

It  should  be  mentioned  that,  in  addition  to  the  equivalence 
operation  (identifying  equivalent  elements),  there  is  also  available 
another  operation  which  consists  in  grouping  words  according  to 
the  vocabulary  field  into  which  they  fall  (e.g.  biochemical  terms, 
logical  terms,  words  relating  to  the  activities  of  scientists).  The 
introduction  of  this  consideration  serves  as  an  important  check  on 
the  equivalence  operation.  The  present  paper,  however,  will  make 
no  use  of  this  second  operation. 

3.  AD  HOC  EQUIVALENCES 

Once  the  equivalence  operation  has  been  applied  to  a  particular 
text,  certain  additional  considerations  may  be  accepted  as  grounds 
for  placing  one  morpheme  sequence  in  subscriptless  equivalence 
with  another.  These  additional  operations  are  carried  out  only  for 
the  present  analysis  of  the  present  text,  and  have  their  final  justifi- 
cation in  their  ability  to  yield  a  more  complete  and  satisfactorily 
interpretable  analysis  of  the  text  in  the  direction  already  set  by  the 
equivalence  operation  alone. 

3. 1  Grammatical  parallel 

If  the  grammatical  relation  of  «  to/is  the  same  as  that  of  ft  to  g, 
then 

af=bg.i3.a  =  b.f=g 
(i.e.  if  two  constituents  are  equivalent,  their  corresponding  gram- 
matical parts  are  equivalent). 

3.2  Textual  parallel 

A  more  uncertain  equivalence,  with  as  yet  uninvestigated  re- 
strictions, holds  that  corresponding  grammatical  parts  are  equiva- 
lent even  if  the  constituents  which  contain  them  ha\e  not  been 


10  DISCOURSE  ANALYSIS  MANUAL 

shown  equivalent,  so  long  as  these  constituents  are  parallel  in  the 
structure  of  the  sentence  or  the  sentence-sequence  (cf.  for  example, 
Appendix  Ic  below). 

3.3  Non-recurring  adjuncts 

If  d  is  an  adjunct  of  a,  i.e.  d  is  such  that  the  constituent  ad  (or  the 
constituent  da)  is  grammatically  equal  to  the  constituent  a,  and 
if  ^  does  not  occur  (or  does  not  occur  as  an  element  in  the  analysis) 
except  with  a,  then 

ad  =  a  (or  da  =  a) 
The  consideration  here  is  that  since  d  does  not  recur,  it  cannot  lead 
to  any  further  or  different  equivalences:  hence  it  cannot  affect  the 
class  into  which  a  is  put.   Hence  we  let  a  be  in  the  class  in  which  it 
would  be  if  d  had  not  occurred  at  all.   We  call  a  the  center  of  ad. 

3.4  Asserted  equivalence 
Under  various  restrictions, 

'a  is  (includes)  Z)'  .  3 .  a  =  b 
That  is,  if  the  text  includes  some  transform  of  the  sentence  'a  is  b" 
or  'a  includes  b'  or  the  like,  we  can  in  many  cases  set  a  =  b. 

3.5  Semantic  assumption 

We  can  arbitrarily  assume  a  =  b.  Analysis  of  a  text  often  yields 
several  sections  within  which  equivalences  have  been  established, 
but  between  which  there  are  gaps.  By  assuming  certain  equivalences 
we  can  bridge  these  gaps,  and  while  this  yields  no  new  information, 
it  gives  a  more  coherent  structure  for  the  text.  If  we  can  maneuver 
the  semantic  assumptions  sufficient  to  fill  the  gaps  into  equivalence 
pairs  which  are  most  obvious  in  general,  or  most  neutral  (or  least 
specific)  to  the  given  discourse,  then  the  semantic  cost  of  the  as- 
sumptions which  are  added  to  the  given  analysis  is  small  compared 
with  the  wholeness  of  analysis  thus  obtained.  Texts  can  often  be 
analyzed  in  such  a  way  that  some  of  the  equivalence  classes  have 
low  semantic  importance  (in  the  case  of  scientific  articles,  often  the 
verb  class)  and  it  is  desirable  to  make  the  semantic  assumptions 
within  these  classes. 


DISCOURSE  ANALYSIS  MANUAL  1  I 

4.  TRANSFORMATIONS 

The  equivalences  of  2,  3  above  enable  us  to  group  certain  morpheme 
sequences  into  equivalence  classes  which  occur  in  some  regular 
way  in  the  discourse.  Since  most  of  the  equivalences  are  based  on 
environment  within  a  sentence-structure,  the  dissimilarities  among 
the  various  sentence  structures  of  the  discourse  restrict  the  appli- 
cability of  the  equivalences.  The  method  of  linguistic  transforma- 
tions makes  it  possible  to  reduce  some  of  these  dissimilarities.  We 
want  in  this  way  to  eliminate  stylistic  variations  among  the  senten- 
ces of  the  discourse,  to  align  these  sentences  grammatically. 

4.1   Transforms  of  sentence  and  discourse 

Given  a  sentence  Si  which  is  a  particular  grammatical  arrange- 
ment gi  of  particular  morphemes  (words)  m^,  a  transform  of  it 
TSi  is  another  grammatical  arrangement  gg,  satisfying  certain 
conditions,  of  the  same  mj.  Each  sentence-structure  grammatical 
arrangement  is  subject  to  one  or  more  transformations  (including 
the  identity).^  TSi  is  itself  either  a  sentence,  or  a  sequence  of 
sentences  with  connectors,  or  a  constituent  to  be  included  in  a 
neighboring  sentence.  If  every  sentence  of  a  discourse  is  operated 
on  by  one  or  more  of  the  transformations  to  which  its  structure  is 
subject,  we  obtain  from  the  succession  of  sentences  Si  which  con- 
stitutes the  discourse  a  succession  of  transformed  sentences  TSt 
which  is  itself  a  succession  of  sentences,  even  though  sentences  of 
the  original  may  have  become  parts  or  sequences  of  sentences  of  the 
new  succession.  This  succession  of  TSi  will  be  called  a  transform 
(TD)  of  the  origi  nal  discourse.  To  the  extent  that  TSi  paraphrases  Si 
for  each  S  in  the  discourse,  TD  paraphrases  the  original  discourse. 

The  availability  of  various  transformations  for  each  sentence  of 
the  discourse  gives  us  a  set  of  transforms  TiD  of  the  discourse, 
including  the  original  as  a  succession  of  identity  transformations 

'■'  Preliminary  lists  of  transformations  were  given  in  earlier  papers  on  discourse 
analysis  and  on  transformations,  in  Language,  28  (1952),  pp.  18-23,  and  33 
(1957),  pp.  283-340.  A  fuller  list  will  appear  in  another  paper  of  the  present 
series.  The  transformations  which  are  used  in  Section  !I  of  the  present  paper 
are  listed  at  the  end  of  Section  11. 


12  DISCOURSE  ANALYSIS  MANUAL 

on  its  sentences;  members  of  TjD  differ  from  each  other  only  by 
transformations.  One  of  these  TjD  is  the  kernel  form  of  the  dis- 
course :  to  obtain  this,  we  use,  for  each  sentence  of  the  original, 
those  transformations  which  most  directly  transform  that  sentence 
into  a  sequence  of  kernel  sentences  with  connectors.  The  kernel 
form  is  not  in  general  the  most  suitable  transform  of  the  discourse 
for  discourse  analysis. 

4.2  Optimal  transform 

In  particular,  there  exist  one  or  more  transforms  of  a  discourse 
which  are  optimal  for  the  applicability  of  the  equivalences  of  2,  3 
above  (especially  the  equivalence  chain).  Let  D  indicate  the  original 
discourse,  and  S  its  sentences,  while  D'  indicates  this  optimal  TD, 
and  S'  its  sentences.^  Then  the  set  S'  is  characterized  by  the  prop- 
erty that  there  are  more  cases  of  various  S/  having  the  same 
equivalence  classes  (i.e.  the  same  morpheme  sequences  or  ones 
which  can  be  set  equivalent  to  these)  in  the  same  grammatical 
position  within  the  Si'  than  happens  among  the  sentences  of  any 
other  TD.  (A  more  useful  optimality,  which  cannot  yet  be  stated 
satisfactorily,  is  that  the  distribution  of  equivalence  classes  in  the 
same  grammatical  positions  among  the  S/  should  permit  a  maximal 
number  of  assignments  of  morpheme  sequences  to  equivalence 
classes,  or  should  leave  fewest  gaps  or  fewest  serious  gaps  in  the 
resulting  discourse  analysis.)  We  do  not  have  to  assume  that  the 
grouping  of  morpheme  sequences  into  equivalence  classes  is  neces- 
sarily the  same  for  each  TD,  though  we  may  expect  that  the  various 
TD  will  be  quite  similar  in  the  equivalence  classes  which  they  admit. 
But  whatever  the  equivalence  classes,  the  recurrence  of  equivalence 
classes  of  D'  within  the  succession  S/  will  be  simpler  than  the  recur- 
rence of  the  equivalence  classes  of  any  other  TD  within  the  senten- 
ces of  that  TD.  We  call  D'  the  optimal  TD,  or  optimal  transform 
of  D,  and  we  call  the  S'.  which  comprise  the  sentences  of  D',  the 
periods  of  D. 

^  The  succession  S'  is  the  result  of  applying  a  particular  sequence  of  transfor- 
mations to  the  successive  sentences  Si.  That  is,  S'  is  a  succession  of  particular 
tranisforms  of  S. 


DISCOURSR  ANALYSIS  MANUAL  13 

We  thus  obtain  the  optimal  transform  of  D  by  selecting  particular 
transformations  for  each  sentence  of  D,  and  carrying  out  the 
equivalences  of  2,  3  above  on  the  resulting  sentences.  The  recur- 
rence of  the  resulting  equivalence  classes  within  the  succession  of 
these  periods  is  the  regularity  of  occurrence  mentioned  in  1  above. 
Although  the  objective  is  the  discourse  equivalences,  then,  the 
obtaining  of  the  optimal  transform  is  purely  an  operation  of 
transformations,  and  the  application  of  the  equivalences  is  a 
separate  (discourse)  operation.  The  discourse  operation  can  be 
applied  directly  to  the  original  D,  but  it  will  then  in  general  leave 
gaps  at  points  where  it  might  not  leave  gaps  in  D'. 

4.3  Periods 

Since  they  are  the  special  case  of  the  sentences  of  the  optimal 
transform  of  a  discourse,  the  periods  together  with  the  connectors 
between  them  have  to  cover  the  whole  discourse.  That  is,  there 
cannot  be  any  part  of  the  discourse  which  is  not  a  grammatical 
part  or  connector  of  some  period.  If  the  discourse  includes  mathe- 
matical expressions,  tables,  or  any  other  material  which  has  -  or,  by 
transformations  proper  to  it,  can  be  reduced  into  -  a  linear  form,* 
this  material  can  be  included  as  a  succession  of  periods  in  its 
proper  place  within  the  discourse. 

A  particular  case  of  the  selection  of  transformations  to  yield 
periods  is  as  follows:  If  a  tentative  optimal  transform  contains 
some  periods  S/  which  have  none  of  the  equivalence  classes  of  the 
discourse  (but  are  residues  of  the  segmentation  of  the  discourse  into 
periods  St'  which  do  contain  these  classes),  and  if  the  connectors 
Q  between  S/  and  Sfc'  are  subject  to  transformations  of  the  type 
Sk  CSj'  ->  Sic'  {Sj')  (where  S(A)  indicates  that  the  sentence  S 
includes  A  as  a  grammatical  part),  then  the  addition  of  this  trans- 
formation gives  a  closer  approach  to  the  optimal  transform  of  D. 
That  is  to  say,  periods  which  do  not  contain  the  equivalence  classes 
are  included  grammatically,  if  possible,  in  neighboring  periods 
which  do. 

'    That  is,  a  sequence  of  strings  of  marks. 


14 


DISCOURSE  ANALYSIS  MANUAL 


4.4  Use  of  transformations 

The  transformations  required  to  obtain  the  optimal  transform 
are  in  part  the  same  as  those  required  to  reduce  the  discourse  to 
kernels,  and  in  part  different.  In  many  cases  we  will  not  apply 
transformations  which  we  would  apply  if  we  were  reducing  to 
kernels.  We  may  even  apply  transformations  in  the  opposite  direc- 
tion to  the  usual.  For  example,  consider  the  short  discourse : 

Truman?    Well,  he's  smart  and  he  isn't  smart.    He's  democratic 
and  he  isn't  democratic.   But  he's  a  politician  without  question. 

He's  smart  is  the  kernel,  and  He  isn't  smart  a  transform  of  it.   But 
in  the  optimal  transform  we  would  obtain : 


But 


Truman 
Truman 
Truman 


is  and  isn't 

is  and  isn't 

is  without  question 


smart, 
democratic, 
a  politician. 


Here  the  first  two  periods  each  combine  two  kernels  which  are 
usually  separated.  As  a  result  the  optimal  transform  does  not 
match  is  with  isn't  (smart),  but  rather  matches  is  and  isn't  (smart) 
with  is  without  question  (a politician):  and  isn't  and  without  question 
are  both  adjuncts  of  is,  the  former  correlating  with  smart  and  with 
democratic  and  the  latter  with  a  politician. 

In  general,  transformations  are  either  S  *-^  S  (i.e.  between  one 
sentence-form  and  another)  or  S  <-*  connected  sequence  of  S  (in 
such  forms  as  5"  ■«->  5*1  C  5'2,  5"  *->  S^  (Nj)  C  S^  (A^i),  or  S  *-^  Si 
(pro  S2)  +  S2).  That  is,  they  change  the  grammatical  form,  or 
divide  or  combine  the  original  sentences.  Determining  the  con- 
nector C  is  not  simple:  for  example,  if  ^i  (N^  Ca  N2)  has  a  trans- 
form 5*1  (Ni)  Cb  Si  (Nz),  it  by  no  means  follows  that  Ca  =  Ci,. 

Transformations  may  be  one-directional  (->)  or  reversible  (*->); 
and  they  may  be  unrestricted,  restricted,  or  textual.  In  the  main 
type  of  restricted  transformation,  each  transformation  has  a  fixed 
schema,  but  with  different  values  of  the  operator  for  different 


DISCOURSE  ANALYSIS  MANUAL  1  5 

values  of  the  operand:  If  Ta  lakes  sentences  containing  the  verb 
Fi  into  modal  V  +  ViU  (for  example:  step  ->  take  a  step,  kick  -► 
give  a  kick),  then  to  each  V^  there  correspond  particular  modal  V. 
Textual  transformations  are  those  which  hold  only  in  the  presence 
of  some  condition  in  the  discourse  or  within  a  stateable  limited 
neighborhood  in  the  discourse:  for  example,  for  certain  verb-pairs 
Vm,  Vn,  (e.g.  buy  -  sell,  lose  -  win)  if  both  appear  in  certain  kinds 
of  matched  environment  within  a  neighborhood,  then  N^  Vm  N 
Pi  N2  2^  N2  Vn  N Pj  Ni.^  Similarly,  one-directional  transformations 
may  be  reversible  in  the  presence  of  certain  textual  conditions.  To 
these  may  be  added  a  purely  textual  basis  for  recasting  the  original 
sentences  into  periods,  namely  zero  recurrence  by  textual  parallel- 
ism: A  neighborhood  in  a  discourse  may  contain  S^  (Ni)  together 
with  a  textually  parallel  S2  which  lacks  A^^.  If  S2.  were  grammati- 
cally parallel  to  Si,  we  would  have  ^3  ->■  52  (A'^i),  that  is,  the  N^ 
of  ^i  occurs  in  zero  form  in  S^-  If  ^2  were  only  textually  parallel 
to  Si,  but  both  contained  Ni,  we  would  apply  3.2  of  the  present 
paper.  But  if  5*2  is  only  textually  parallel  to  ^i  and  does  not  con- 
tain Ni,  we  may  nevertheless  in  certain  cases  say  that  Ni  occurs  in 
zero  form  in  5*2,  and  obtain  5*2  ^  S2  (Ni). 

While  short  lists  of  common  transformations  can  be  readily 
determined  for  any  language,  one  can  also  discover  putative  new 
ones  (especially  restricted  and  textual  transformations)  in  the  pro- 
cess of  grammatically  aligning  a  text,  by  seeing  what  would  yield 
desirable  periods  for  a  discourse  analysis.  It  is  then  necessary  to 
see  if  these  satisfy  the  conditions  for  a  transformation. 

We  may  note  that  reduction  to  optimal  transforms  requires  only 
what  is  required  for  transformations :  sentence  division,  and  some 
system  of  grammatical  relations  (constituent  analysis,  transforma- 
tional history,  or  other). 


°  We  use  ->  for  transformations,  ^  for  other  recastings  of  sentences.  These 
are  like  transformations  in  that  they  apply  to  any  values  of  their  variables,  e.g. 
here  to  any  A^,,  A'^2  or  Si,  S^  which  occur  in  the  required  textual  relations  to 
each  other.  Recastings  hold  for  more  arbitrary  and  smaller  classes  of  words 
than  is  the  case  in  transformations. 


16  DISCOURSE  ANALYSIS  MANfUAL 

5.  PROCEDURE 

A  preliminary  sketch  of  the  actual  procedure  of  analysis :  We  may 
work  (downward)  from  the  original  text,  or  (upward)  from  a 
kernelization  of  the  text.  First  we  note  the  oft-recurring  words  or 
morpheme  sequences  other  than  the,  and,  and  other  words  which 
are  common  in  almost  all  discourses.  We  then  note  which  of  these 
occur  frequently  in  each  other's  neighborhood  and  what  are  the 
grammatical  relations  among  them,  i.e.  their  relative  positions 
within  a  sentence  or  a  kernel. 

We  start  with  those  grammatical  stretches,  sentence  structures  or 
constituents,  that  contain  combinations  of  the  same  recurrent 
words,  and  we  seek  transformations  that  will  recast  all  of  them, 
usually  by  breaking  them  down,  into  tentative  periods,  i.e.  into 
sentence  structures  which  have  these  recurrent  words  in  the  same 
grammatical  positions.  If  we  start  with  kernels,  we  seek  all  con- 
nected sequences  of  kernels  that  contain  the  same  combination  of 
two  or  more  recurrent  words,  and  seek  transformations  that  will 
combine  each  of  these  sequences  into  a  period.^ 

We  then  turn  to  the  grammatical  stretches  which  contain  only 
one  of  the  recurrent  words  that  had  elsewhere  occurred  in  recurrent 
combinations.  Here  we  check  whether  we  can  show  by  grammati- 
cal or  by  textual  parallelism  that  the  missing  members  of  the  com- 
bination are  present  in  zero  form.  If  we  can  show  this,  we  may 
proceed. 

If  we  cannot  establish  by  grammatical  or  by  textual  parallelism 
that  the  missing  members  of  the  combination  are  present  in  zero 
form,  we  seek  transformations  that  will  bring  these  stretches  into 
closest  grammatical  alignment  with  the  already  established  periods 
containing  these  words. 

*  In  some  cases,  we  may  find  that  we  cannot  obtain  periods  with  identical 
positioning  of  the  recurrent  words  (e.g.  the  story  in  the  Appendix  contains 
GW  periods,  and  HUGW,  and  HUHUGW),  but  that  the  positioning  of  the 
recurrent  words  in  one  period  is  some  grammatical  elaboration  of  their  posi- 
tioning in  another  period;  sometimes  we  can  tind  other  transformations  that 
will  make  the  similarities  among  the  periods  more  manageable  -  e.g.  if  one 
period  simply  contains  the  other. 


DISCOURSE  ANALYSIS  MANUAL  1 7 

Finally  we  turn  to  those  recurrent  words  that  don't  occur  in 
combination  with  others,  and  seek  transformations  that  will  align 
the  grammatical  stretches  containing  them  into  periods.  We  note 
any  similarities  or  relations  between  the  remainders  of  these  periods 
and  the  remainders  (or  recurrent  words)  of  all  other  periods,  to  see 
if  we  can  describe  any  part  of  one  period  or  set  of  periods  in  terms 
of  some  part  of  another  period. 

When  we  have  periods  whose  corresponding  grammatical  parts 
are  equivalent  (by  2,  3  above)  we  write  them  in  a  double  array, 
each  period  being  assigned  a  row,  and  each  relevant  morpheme 
sequence  a  column  (e.g.  see  4.4  above).  Then  each  column  is  an 
equivalence  class,  each  row  shows  the  composition  of  equivalence 
classes  into  a  period,  and  the  sequence  of  all  rows  is  the  transformed 
discourse  itself.  The  rows  show  the  relation  of  equivalence  classes 
to  their  periods,  and  the  columns  show  the  successive  members  of 
each  equivalence  class  in  successive  periods  of  the  discourse.  This 
tabular  form  requires  that  in  periods  which  have  an  equivalence 
class  in  common,  corresponding  classes  should  be  written  in  the 
same  order.  To  this  end  A-^  B  may  be  written  to  indicate  a  period 
consisting  of  the  sequence  B  A,so  that  if  we  have  period  structures 
ACE,  EBF,  BA,  we  may  write 

ACE 

EBF 
Ai        B 
even  if  there  is  no  transformation  which  yields  a  grammatical 
inversion  of  BA.  (The  superscript  -1  expresses  a  twisting  of  the 
array.) 

Assignment  of  morpheme  sequences  to  columns  is  based  on  the 
equivalences  of  2,  3  above.  Since  every  equivalence  of  degree 
n4-^  is  based  on  an  equivalence  of  degree  u,  we  can  keep  a  record 
of  all  equivalences  in  such  a  way  as  to  show  the  lowest  degree  of 
equivalence  between  any  two  equivalents.  For  example,  if  the 
transformed  discourse  contains  periods  composed  of  the  morpheme 
sequences  a,  b,  . . . ,  as  follows  (where  at  is  the  /"*  occurrence  of  a) : 

aig 
ao  Cj^ 


18  DISCOURSE  ANALYSIS  MANUAL 

b  62 

bf 

cf 

dg 
we  can  write  the  equivalences : 

g  =1  Ci  =0  C2  =1  f 
or  coalescing  around  =0,  and  omitting  the  subscript  1  (so  that  = 
represents  =1): 

g  =  e  =  f 
In  such  a  sequence  of  =  1  the  degree  of  equivalence  between  entries 
is  the  sum  of  the  subscripts  between  them;  e.g.,  g  =if.  To  calculate 
this  according  to  the  chain  in  2  above : 

.'.  a  =1  b 

A  similar  sequence  of  =  1  can  be  made  for  the  environments  of  e, 
/,  g,  and  the  two  sequences  can  be  so  arranged  that  the  environ- 
ment which  establishes  an  equivalence  will  be  found  written  above 
or  below  the  equivalence  sign : 

g  =  e  =  f 

d  =  a  =  b  =  c 

Here  a  =1  <^  in  the  environment  g  (since  we  have  ag  and  dg^, 

e  =^  g^  in  the  environment  a;  <i  =3  c  (in  the  environments  g  =2/), 

and  so  on,  forming  a  pyramid. 

In  many  cases,  one  or  more  of  the  equivalence  classes  (the  co- 
lumns or  equivalence  sequences  mentioned  above)  which  occur  in 
a  set  of  periods  are  found  to  have  few  members  (i.e.  the  same 
members  recur  in  many  periods)  or  only  small  differences  among 
their  various  members,  while  the  remainders  (usually  long)  of  these 
periods  are  very  different,  and  are  equated  with  each  other  by 
help  of  the  more  constant  sections. 

Also,  in  many  cases  we  find,  or  can  obtain  by  transformations,  a 
number  of  similarly  constructed  periods,  say  hac,  bed,  fid,  in  which 
one  or  two  positions  (which  we  might  consider  tentative  columns 
of  the  double  array)  contain  words  of  low  semantic  importance  or 
small  semantic  distinction  relative  to  the  discourse,  or  to  the  set 


DISCOURSH  ANALYSIS  MANUAL  19 

of  periods  -  the  middle  position  in  this  case.  This  is  of  course  a 
matter  of  semantic  judgment.  These  will  often  be  single  words  or 
short  phrases,  or  words  from  vocabulary  fields  other  than  those 
central  to  the  discourse.  We  will  then  make  our  semantic  equiva- 
lences (3.5  above)  in  these  positions,  obtaining  first  a  =  e  =  i, 
whence  b  =  /and  c  =  d. 


6.  INTERPRETATION 

The  special  interest  in  discourse  analysis,  aside  from  its  relevance 
to  language  structure,  derives  from  the  fact  that  the  structure,  as 
seen  for  example  in  the  double  array  of  equivalence  classes,  permits 
useful  interpretations  in  respect  to  the  particular  discourse.  Inter- 
pretations about  how  the  assertions  in  the  discourse  are  related  to 
each  other,  or  about  the  separability  of  a  discourse  into  sections 
which  relate  to  each  other  may  be  made.  We  may  find,  for  example, 
that  statements  about  the  investigator's  activities  enclose  state- 
ments about  the  facts  of  the  science  and  that  statements  about  data 
are  connected  by  statements  of  argument.  The  types  of  discourse 
structure,  and  their  interpretations,  will  be  discussed  in  later  papers 
of  this  series. 


II.  DISCOURSE  ANALYSIS  OF  A  TECHNICAL  ARTICLE: 
NORMALIZATION  OF  A  TEXT 


The  Structure  of  Insulin  as  Compared  to  that  of  Sanger's  A-Chain^ 

by 

K.  Linderstr0m-Lang  and  John  A.  Schellman 

1  The  optical  rotatory  power  of  proteins  is  very  sensitive  to  the 
experimental  conditions  under  which  it  is  measured,  particu- 

2  larly  the  wavelength  of  the  light  which  is  used.  Consequently 
single  measurements  of  optical  rotation  do  not  give  an  ade- 
quate description  of  rotatory  properties,  though  they  have 
often  proved  very  useful  in  the  characterization  of  proteins 

3  and  the  detection  of  changes  in  structure  in  solution.  The 
diversity  of  the  factors  which  affect  optical  rotation  is  in  many 
ways  an  advantage,  for  the  variation  of  each  parameter  yields 
a  separate  source  of  experimental  information. 

4  One  of  the  authors  (J.A.S.)  has  for  some  time  been  engaged 
in  a  study  of  the  rotatory  properties  of  several  proteins  inclu- 
ding  the   effect   of  temperature,   wavelength,   pH   and   the 

5  denaturation  reaction.  One  phase  of  this  research,  the  depend- 
ence of  the  rotatory  properties  of  proteins  on  wavelength,  is 
recorded  here  because  it  is  of  special  importance  to  the 
problem  at  hand. 

6  in  agreement  with  the  observations  of  HEWITT  it  was 
found  that  all  the  polypeptide  systems  which  were  investigated 

'     Reprinted  from  Bioclicmica  el  Biophysica  Ada,  XV  (1954),  pp.  156-7. 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT  21 

obeyed  a  one  term  Drude  equation  within  experimental  error 
(usually  less  than  1  %); 

A 


[a] 


where  a  is  the  specific  rotation  and  X  the  wavelength  of  the 

7  measurement.  A  and  Xc  have  a  certain  amount  of  theoretical 
significance  but  will  be  regarded  here  as  empirically  deter- 
mined quantities. 

8  The  dependence  of  the  optical  rotation  on  experimental 
conditions  results  from  the  fact  that  A  is  in  general  a  function 

9  of  temperature,  pH,  ionic  strength,  denaturation,  etc.  Xc,  on 
the  other  hand,  varies  only  as  a  result  of  drastic  changes  in  the 
protein  system,  in  particular  denaturation  by  urea  or  guanidine 

10  or  titration  to  a  pH  between  10.5  and  12.0.  The  values  of  Xc 
and  [a]^  are  given  in  Table  1  for  a  number  of  proteins  and 

11  related  substances.  (Ago  may  be  obtained  from  [a] ^  by  means 

12  of  the  relation  Ago  =  [a]S(>^D— ^")-  ^c  is  not  dependent  on 
temperature  within  experimental  error  (approximately  ±  50  A). 

13  The  results  for  gelatin  are  taken  from  CARPENTER  AND 
LOVELACE. 

14  (TABLE  I) 

15  Except  for  insulin  the  specific  rotations  were  independent 

of  moderate  changes  in  protein  concentration. 

16  The  substances  in  Table  I  are  listed  in  the  order  of  descending 

17  values  of  Xc.  The  table  has  been  divided  into  two  groups  to 
emphasize  an  obvious  pattern,  namely,  all  the  substances  in 
Group  A  are  native  globular  proteins,  all  those  in  Group  B 
are  not. 

18  The  entries  in  Table  I  are  too  few  to  permit  any  certain 
generalizations  but  the  indication  is  that  Xc  provides  a  measure 
of  the  presence  of  secondary  structure  in  polypeptides,  having 
values  above  2400  A  for  the  ordered  configurations  of  native 
proteins,  and  values  less  than  2300  A  for  polypeptides  in  dis- 

19  ordered  states.  We  are  here  subscribing  to  the  view  that  heat 
and  chemical  denaturation  result  in  the  unfolding  of  the 


22  A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 

20  hydrogen  bonded  structure  of  a  protein.  Clupein  naturally 
falls  into  the  latter  group  because  the  internal  repulsion  due  to 
its  high  positive  charge  makes  folded  molecular  forms  un- 
stable. 

21  The  presence  of  the  oxidized  A-chain  of  insulin  in  Group  B 

22  was  not  expected.  In  fact  it  was  hoped  that  the  A-chain  would 
exist  as  an  a-helix  in  solution  and  would  therefore  serve  as  a 
model  substance  of  known  structure  in  the  study  of  denatura- 

23  tion.  Instead  it  was  found  that  the  A-chain  possesses  rotatory 
properties  which  resemble  those  of  clupein  very  closely,  but 

24  do  not  resemble  those  of  insulin  itself.  Most  striking  is  the 
fact  that  the  specific  rotations  of  clupein  and  the  A-chain  are 
virtually  unaffected  by  strong  solutions  of  urea  and  guanidine 

25  chloride.  Ordinary  proteins,  including  insulin,  undergo  chan- 
ges in  specific  rotations  of  100%  to  300%  under  these  condi- 

26  tions.  These  results  suggest  that  the  oxidized  A-chain  is 
largely  unfolded  in  aqueous  solution  and  are  in  agreement 
with  the  recent  finding  that  the  peptide  hydrogen  atoms  of  the 
A-chain  exchange  readily  with  DgO,  whereas  those  of  insulin 
do  not. 

27  A  detailed  report  of  this  work  will  appear  later. 

We  first  seek  a  set  of  words  or  morphemes  which  recur  in  several 
sentences  and  which  are  in  the  same  grammatical  relations  to  each 
other  or  in  relations  that  seem  to  be  transformable  into  the  same 
relations.-  Since  these  several  sentences  will  in  general  contain 
additional  material  which  obscures  the  relations  among  our  words, 
we  want  to  break  these  sentences  up  into  smaller  clauses,  that  is, 
included  sentences,  5"*,  one  or  more  of  which  will  contain  this  set 
of  words.  To  obtain  these  S,i  we  divide  by  transformations  or  by 
recastings,  each  of  the  sentences  into  sections,  words  or  phrases, 
to  the  extent  necessary  for  grouping  the  sections  into  the  desired 
Si.  We  then  combine  these  sections  grammatically  into  S{,  and 
the  Si  into  the  original  sentence.   In  constructing  the  5/  out  of  the 

''  In  most  cases,  we  consider  morpheme  sequences  that  are  set  up  by  a  standard 
grammar. 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT  23 

grammatical  sections,  we  fill  in  all  zero  recurrences  which  are 
indicated  by  the  conjunctions.  For  example,  Nt  V  C  V  indicates 
zero  recurrence  of  Nt,  yielding  Nt  V  C  Nt  V.  We  now  consider 
each  Si  and  from  among  the  transformations  which  can  be  applied 
to  it,  that  is,  those  which  are  applicable  to  the  structure  of  that  St, 
we  select  whatever  transformation  will  bring  it  into  the  same 
grammatical  structure  as  other  St  or  their  transforms.  These 
transformed  St  then  contain  the  same  set  of  words  in  the  same 
grammatical  relations,  as  far  as  possible;  they  are  therefore  the 
periods  of  the  discourse,  i.e.  the  sentences  of  the  optimal  transform 
of  the  discourse. 

We  begin  with  the  following  prominent  set  of  words: 
optical  rotation  (in  sentences  2,  3,  8) 

optical  rotatory  power  in  proteins  (in  1) 
rotatory  property  (in  2,  23) 

rotatory  property  of  proteins  (in  4,  5) 

specific  rotations  (in  6,  15,  24,  25) 

Of  these  sentences,  1,  4,  5,  8,  and  25  also  contain  conditions  or 
wavelength. 

Next,  starting  with  sentence  1 ,  we  divide  each  of  these  sentences 
into  sections.  The  sections  are  separated  by  slant  bars  and  are 
marked  by  their  grammatical  classification:  A'^  noun  phrase,  V 
verb  phrase,  A  adjective  phrase,  P  preposition  (of,  on,  for,  by),  C 
conjunction,  S  sentence-structure.  The  subscript  numbers  indicate 
the  particular  occurrence  of  the  grammatical  class  of  which  the 
section  is  a  member.  Pro-morphemes  (e.g.  pronouns)  are  marked 
with  the  class  and  number  of  the  section  whose  grammatical  recur- 
rence they  constitute.  For  example,  in  the  man  who  the  -o  is  a 
grammatical  recurrence  of  the  man.  If  the  further  division  of  a 
section  will  not  contribute  to  obtaining  the  desired  St,  the  section 
is  not  divided  further. 

1.  The  optical  rotatory  power  of  proteins  /  is  very  sensitive  /  to 

I  the  experimental  conditions  /  under  /  which  /  it  /  is  measured  /, 

A^2  C  N^      N,       V^ 


24 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


particularly  /  the  wavelength  of  the  light  which  is  used. 

C  N, 

Now  we  combine  the  sections  into  St,  filling  in  all  indicated  zero 
recurrences,  and  combine  the  St  into  the  original  sentence. 

5^  =  Nj_  Fi  to  N. 

S^  =  Ni  V.2  under  N2 

^3  =  A^i  Vi  to  7V3 

S  C  S  <^  S;  C  S  yields  three  separate  periods  :^ 


wh 


,partic- 
ularly 


the  optical  rotatory 
power  of  proteins 

the  optical  rotatory 
power  of  proteins 

the  optical  rotatory 
power  of  proteins 


is  measured 
under 

is  very 
sensitive  to 

is  very 
sensitive  to 


the  experimental 
conditions 

the  experimental 
conditions 

the  wavelength  of 

the  light  which  is 

used 


whence  the  following  equivalences  are  established  by  2  of  Section  I : 

is  very  sensitive  to  =1  is  measured  under 
the  experimental  conditions  =1  the  wavelength  of  the  light  which  is 

used 
We  continue  this  procedure  for  each  of  the  sentences  of  the  dis- 
course : 


4.  One  of  the  authors  (J.A.S.)  has  for  some  time  been  engaged 
in  a  study  of/  the  rotatory  properties  of  several  proteins 

/  including  the  effect  /  of  /  temperature  / ,  /  wavelength  / ,  /  pH  / 


V,a 


N, 


C    N, 


C   N4 


*  This  transformation  will  apply  in  ail  successive  sentences  in  which  C  occur, 
and  will  not  be  listed  again.  The  semi-colon  indicates  a  sentence  division 
resulting  from  transformation.  The  first  division  is  Si  wh  St  (where  wh  Si  is 
an  adjectival  phrase  of  N^  in  S,),  plus  C  S3:  hence  the  centers  are  Sj  and  Sa, 
and  the  zeros  of  5",  are  filled  from  Si  (is  sensitive  to).  Next  the  wh  is  treated  as 
a  C,  and  5,  wh  Si  is  divided  into  5,  and  52. 


A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 


25 


and  /  the  denaturation  reaction.* 
C       N, 

S^  =  N,V,ofN, 

S,=^  N^V.ofN, 
Ss  =  N,  V.ofN, 
Si  =  One  .  .  .  study  of  Ny 

The  transformation  Sy  (A  Ni)  *->■  5*  (A'^i);  A'l  /iv  changes  Ni  Via  of 
N2  into  5*2  above,  and  so  for  the  others,  yielding:* 


and 


the  rotatory  properties 
of  several  proteins 

the  rotatory  properties 
of  several  proteins 

the  rotatory  properties 
of  several  proteins 

the  rotatory  properties 
of  several  proteins 


include  the 

effect  of 

include  the 

effect  of 

include  the 

effect  of 

include  the 

effect  of 

temperature 

wavelength 

pH 

the  denaturation 
reaction 


The  equivalence 

temperature  =1  wavelength  =1  pH  =1  the  denaturation  reaction 

is  established  by  2  of  Section  I,  and  the  equivalence 

wavelength  =  the  wavelength  of  the  light  which  is  used 
by  3.3  of  Section  I. 

5.  One  phase  of  this  research  /  ,  /  the  dependence  /  of  /  the 

Vin  Ni 

rotatory  properties  of  proteins  /  on  /  wavelength  / ,  /  is  recorded... 


*  Xy  indicates  an  element  of  class  A' to  which  an  affix  or  other  element  has  been 
added  such  that  the  resultant  {X  plus  y)  is  in  class  Y.  Sn  therefore  is  a  nomi- 
nalization  of  5".  S(X)  indicates  that  5  contains  A' (usually  as  a  noun-phrase). 
"  This  is  a  special  case  of  the  word-sharing  transformations:  X-\-  Y\  Y[Z]<-* 
X+  Y[Z]  when  the  common  element  Y  is  center  of  Y  [Z]  and  on  a  par  with  X 
in  the  X+  K  construction.  The  A  here  is  an  adjectivized  verb  Va  (plus  object), 


26  A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 

S^  =  Ni  Fi  on  N^ 

S^  =  One  . . .  research,  Son  (or:  the  following),  is  recorded. . . 
The  transformation  S^  {S^n)  -^  S^  (pro-^a");  S2  changes  depend- 
ence of,  which  is  KjAT,  to  depend,  which  is  V^,  yielding:^ 

the  rotatory  properties  of  proteins  |  depend  on  |  wavelength. 
From  sentences  4  and  5,  the  following  equivalences  are  established : 
several  proteins  =  proteins 
depend  on  =1  include  the  effect  of 

8a.  The  dependence  /  of  /  the  optical  rotation  /  on  /  experimental 
V^n  N^  Ni 

conditions  /  results . . . 

S^  =  N^  Fi  on  N.2 

Sy  —  Son  results ...  or  This  results . . . 
The  same  transformation  as  in  5  changes  ^gn  (i.e.  Vin  of  N-^^  on 
N2)  into  S2,  yielding: 

the  optical  rotation  |  depends  on  |  experimental  conditions 
Since  rotation  is  Vn  (from  rotate),  and  rotatory  properties  or  power 
is  VaN'  (where  A^'  is  a  subset  of  nouns,  including  these  and  capacity, 
condition,  etc.)  we  have  the  restricted  transformation  V^a  N'  of 
N2-*  V^n  of  No,  whence  -tion  =  -ory  properties  =  -ory  power  (since 
optical  rotatory  =  rotatory)"'  whence  the  zero  after  rotation  is  =  o 
of  proteins,^  and  include  the  effect  o/(in  4)  =^1  is  very  sensitive  to 
(in  I). 


and  the  v  which  changes  it  into  a  verb  consists  merely  in  dropping  the  a  {-ing). 
Due  to  a  degeneracy  of  English  transformations  one  could  also  apply  here  the 
transformation  Si  {S-,n)  ->  Si  (pro-Sin);  S^.  The  two  results  are  roughly 
equivalent  here. 

*  Here  too  there  is  an  alternative  transformation  applicable:  A'l,  A'^2.  ^i  ~*" 
Ni  Vi',  Ni  is  Nz-  In  our  case  this  would  yield  One  phase  of  this  research  is  the 
dependence .. .  wavelength.  Since  we  would  prefer  our  period  to  be  purely  S^, 
to  match  the  preceding  periods,  we  do  not  use  this  transformation. 

'  /4,  /Jj  A'  is  a  degenerate  transform  both  of  ^f ,  A^  -\-  A 2  N and  of  Ai  N  -•-  A^ 
N.  Here  we  accept  the  former,  taking  optical  as  adjunct  of  rotatory,  not  of 
power,  i.e.  taking  the  phrase  as  a  transform  o(  power  of  optical  rotation  not  of 
optical  power  of  rotation.  We  do  so  because  optical  rotation  occurs  elscwliere 
here,  while  optical  power  does  not. 

•  Alternatively,  we  could  add  of  proteins  by  the  rather  uncertain  textual 
parallel  (3.2  in  Section  1),  and  obtain  -tion      2  -ory  properties. 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


27 


Wc  thus  have  equivalences:  (1)  among  all  A^^  above,  i.e.  all  are 
members  of  one  equivalence  class,  r;  (2)  among  all  V  above,  so 
that  all  V  P  were  members  of  one  class,  w;  (3)  among  all  entries 
in  the  third  columns  above,  all  members  of  a  third  equivalence  class, 
K.  Then  all  the  periods  above  have  the  structure 
R  of  proteins  w  k 

3.  Thediversity  of  the  factors  /  which  /  affect  /  optical  rotation/ 

is.  .  . 

5j  =  N^  wh  S^is... 
The  passive  transform  of  S2  is 

optical  rotation  |  is  affected  by  |  diversity  of  factors 
We  can  fit  this  into  the  preceding  set  only  by  assuming  (semanti- 
cally)  is  affected  by  =  include  the  effect  o/and  adding  of  proteins  by 
footnote  8.  Then  diversity  of  factors  becomes  a  member  of  k.^ 


24.  Most  striking  is  the  fact  that  /  the  specific  rotations  /  of  / 

clupein  /  and  /  the  A-chain  /  are  virtually  unaffected  /  by  /  strong 

solutions  of  urea  and  guanidine  chloride. 
5^  =  N^ofN.VT^byN, 
S^  =  N^ofN^  V.byN, 

Si  =  Most  striking  is  the  fact  that  S2  and  S3  or  Most  striking  are 
the  following  facts 

Si  (that  iSo)-^^!  {pro-S^n);  S^  (similar  to  the  transformation  in 
5  above): 


and 


"  Given  periods  x  y  (i.e.  whose  parts  are  members  of  equivalence  classes  x 
and  y)  and  a  period  x  z,  we  can  derive  that  z  is  a  member  of  y  without  specifying 
its  degree  of  equivalence  to  a  particular  member  of  y. 


the  specific 

of  clupein 

are  virtually 

strong 

rotations 

unaffected  by 

solutions 

the  specific 

of  the  A-chain 

are  virtually 

strong 

rotations 

unaffected  by 

solutions 

28 


A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 


clupein  —^  the  A-chain 

specific  rotations  —  rotations 

are  virtually  unaffected  by  =---  is  affected  by^^ 

In  26  we  will  obtain  A-chain  =  i  insulin  and  in  25  insulin  =  i  protein, 

hence  all  of  these  are  members  of  one  equivalence  class  which  we 

may  call  H.    Then  strong  solutions  of  urea  and  guanidine  chloride 

=  3  diversity  of  factors  and  all  the  periods  obtained  so  far  have  the 

structure 

R  o/H  W  K 

25.  Ordinary  proteins  / ,  including  /  insulin  / ,  /  undergo  changes  / 

in  /  specific  rotations  /  of  /  100%  to  300%  /  under  /  these  con- 
A^3  N,  N, 

ditions. 

S^  =  Nj_  Ki  in  Ns  of  N^  under  N^ 

Sz  =  N2  Vi  in  N3  of  Ni  under  N^ 
The  restricted  transformation  N^VP  yVg  ->  N^  of  N^  F  yields  :^^ 


specific 

of  ordinary 

undergo  changes 

these 

rotations 

proteins 

of  100%  to  300% 
under 

conditions 

,  in- 

specific 

of  insulin 

undergo  changes 

these 

cluding 

rotations 

of  100%  to  300% 
under 

conditions 

ordinary  proteins  =  proteins 

insulin  =  j  proteins 

these  conditions  is  in  k  because  these  is  an  adjectival  recurrence  of 


'"  The  meaning  of  un-  does  not  alter  the  position  of  virtually  un-  as  adjuncts 
of  affected.  Discourse  equivalence  does  not  necessarily  mean  semantic  simi- 
larity, but  rather  identity  of  class  relation.  Both  are  viriually  unaffected  by  and 
is  affected  by  connect  members  of  R  0/  H  as  subject  with  members  of  k  as 
object. 

"  This  requires  that  we  state  to  what  words  or  classes  of  words  this  transfor- 
mation is  restricted,  and  that  the  pair  rotations,  proteins  with  undergo  change  is 
inculdcd  in  these. 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


29 


the  corresponding  section  of  24  (k),  while  conditions  is  a  nominal- 
izer  of  the  A^'  subset  (see  8a  above).  Hence 

undergo  changes.  .  .  under  ^2  ^'^^  virtually  unaffected  by,  so  that 
here  too  we  have 

R  of  WHK 

15.  Except  /  for  /  insulin  /  the  specific  rotations  /  were  indepen- 

dent  /  of  /  moderate  changes  in  protein  concentration. 

S,=.  N^V.ofN, 
Except  for  A^i,  5"!  ->  Si';  but  not  S\  (where  S\  ^  S^  with  Ni 
replacing  or  adjoining  some  A'^  of  S^  which  is  similar  to  A^i  in  its 
co-occurrents).  If  P  Ni  is  added  in  S'l,  we  may  add  in  ^i  a  cor- 
responding P  non-Ny.  This  is  a  special  case  of  the  C-transforma- 
tions  (e.g.  in  1  above).';The  only  locations  that  meet  these  conditions 
is  to  have  A'^i  replace  protein  or  P  Ni  added  after  rotations;  we  select 
the  latter  because  it  gives  an  r  o/h  w  k  structure  identical  with  the 
other  periods. 


but  not 


Since  the  first  three  columns  (after  the  connective)  are  r  P  h  w, 
moderate  changes  in  protein  concentration  is  a  member  of  k.  We 
note  thai  protein  occurs  here  as  part  of  a  member  of  k,  though  else- 
where it  is  itself  a  member  of  h. 


the  specific 

(for  non-insulin) 

were  inde- 

moderate 

rotations 

pendent  of 

changes . . 

the  specific 

for  insulin 

were  inde- 

moderate 

rotations 

pendent  of 

changes . . 

23.  Instead  it  was  found  that  /  the  A-chain  /  possesses  /  rotatory 

Ni  Fi  N, 

properties  /  which  /  resemble  /  those  /  of  /  clupein  /  very  closely 


C  No 


No 


N, 


(to  F2) 


/ ,  but  /  do  not  resemble  /  those  /  of  /  insulin  itself. 
C         Kg  N,  N^ 


30 


A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 


S^  =  N^  Fi  A^2  wh  No  Kg  N^  of  N^ 

53  =  Ni  Ki  A^o  nh  No  V2  N2  ofNi 
To  bring  rotatory  properties  into  its  usual  subject  position : 
A'^i  Vh  N2  -*  A^o  of  A^i  (where  Vh  is  a  verb  subset  including  have, 
possess,  etc.). 

To  combine  the  conjoined  parts  of  each  S: 

N2  of  Ni  wfi  No  V2  -^  N2  of  A^i  V2  (see  footnote  5) :  rotatory 
properties  of  the  A-chain  resemble  those. . . 

To  avoid  periods  that  contain  members  of  r  twice  we  use  the 
restricted  transformation  A^i  Ve  N^  ->■  A^i  Vg  Nxl  N^  Ve  Nx  (where 
Ve  is  a  verb  subset  including  equal,  resemble,  etc.,  and  Nx  is  some 
particular  unstated  N)  which  yields : 


but  not 


rotatory  properties 
rotatory  properties 
rotatory  properties 


of  the  A-chain 

of  clupein 
of  insulin  itself 


resemble  very 
closely  Nx 

resemble  very 
closely  Nx 

resemble  very 
closely  Nx 


clupein 


insulin    (also    from    24, 


whence    the    A-chain 

We  now  proceed  to  other  sentences  which  contain  members  of 
H  and  of  K  {solution  2,  22,  26;  denaturation  19);  in  these  and  in  20 
we  will  find  a  recurring  word  folded. 


19.  We  are  here  subscribing  to  the  view  that  /  heat  /  and  / 

A^i         C 
chemical  denaturation  /  result  in  /  the  unfolding  of  the  hydrogen 
N^  V,  N, 

bonded  structure  /  of  /  a  protein. 


'*  Alternatively,  to  bring  rotatory  properties  into  its  usual  subject  position  we 
transform  to  the  passive  of  the  first  5  in  S2,  S3,  obtaining  A'^  Vi'^  Ni  (rotatory 
properties  are  possessed  hy  tlw  A-chain).  We  combine  the  two  conjoined  5  in 
53  (and  so  in  ^2)  by  N^  K,  wh  N2  V^  (the  English  order  is  ^2  wh  N^  Vj.  V^)  <-+ 
Nt  V,a  y^  (or  A'2  M  K,). 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


31 


the  unfolding... 

of  a  protein 

results 

heat 

structure 

from 

the  unfolding... 

of  a  protein 

results 

chemical 

structure 

from 

denaturation 

Si  =-  We  .  .  .  view  that  S2  and  S3  (or  We  ...  to  the  following  view) 
The  restricted  passive  for  V  P  (here,  for  result  in)  brings  S2,,  S^ 
grammatically  closer  to  the  preceding  periods : 


and 


heat  =  1  chemical  denaturation  —  denaturation 

20b.  Clupein  naturally  falls  into  the  latter  group  /  because  /  the 

C 
internal  repulsion  due  to  its  high  positive  charge  /  makes  /  folded 

TV,  Fi 

molecular  forms  unstable, 

^2 

52  =  TVi  V^  N^ 

Si  =   Clupein   .  .  .  group  C  So  (or:   . .  .  group  because  of  the 

following) 
The  passive  of  5*2  would  be  read  Folded  molecular  forms  are  made 
unstable  by.  . .  but  its  constituent  structure  is 


folded  molecular 
forms   unstable 


are  made  by 


the  internal 
repulsion. .  .^^ 


26.  These  results  suggest  that  /  the  oxidized  A-chain  /  is  largely 
unfolded  /  in  /  aqueous  solution  /  and  are  in  agreement  with  the 

'^  There  are  partial  similarities  between  19  and  20:  semantical  between  results 
in  and  makes:,  and  N3  of  19  -*  hydrogen-bonded-structure  folds  with  un-  as 
adjunct  of  folds  while  A'^.2  of  20  ->  molecular  forms  are  folded  with  unstable  as 
adjunct  of  this  sentence.  Here  folds  may  be  a  transform  of  is  folded  because  of 
a  restricted  relation  between  certain  verbs  in  passive  and  in  intransitive  position. 
If  we  therefore  match  the  columns  of  19  and  20,  we  could  assume  o/m  (and  in 
particular  of  clupein)  after  A^a  in  20;  but  this  is  not  derivable. 


32  A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 

recent  finding  that  /  the  peptide  hydrogen  atoms  /  of  /  the  A- 
chain  /  exchange  readily  with  DgO  /  ,  whereas  /  those  /  of  / 
insulin  /  do  not. 

52  =  A^i  Vi  in  N.2 

53  =  A^3  ofN,  V^;  C  N^  ofN,  V^ 

S^  =  These  results  suggest  that  So  and  . .  .  finding  that  S3 

S3  does  not  have  the  r  0/ h  w  k  structure;  but  we  can  derive 

from  it  the  A-chain  =  1  insulin  and  obtain  two  periods  N3  of  h  V2. 
S2  can  be  grammatically  aligned  with  \9hy  N^  V^,  (with  whatever 

P  No)  ->■  Vojng  ofNy  (with  same  P  N^);  with  N  P  N^-^  N  isP  No 

to  regain  sentence  status: 


being  largely  unfolded 


of  the  oxidized 
A-chain 


is  m 


aqueous 
solution 


The  first  column  here  has  the  same  center  (unfold)  as  in  19  and 
the  two  are  in  the  same  class  (by  inclusion  of  adjuncts).  Hence 
is  in  =4  results  from.  ^^ 

22.  In  fact  it  was  hoped  that  /  the  A-chain  /  would  exist  /  as  / 

N,  V, 

an  a-helix  /  in  /  solution  /  and  /  would  therefore  serve  /  as  / 
N2  N3  V, 

a  model  of  known  structure  /  in  /  the  study  of  denaturation. 

A^4  A^o 

S^  =  Ni  Vi  as  No  in  N3 

^3  =  Ni  F2  as  Ni  in  N^ 

5j  =  In  fact  .  .  .  that  S2  and  S3  (or:  In  fact  the  following  things 

were  hoped) 

The  similarity  between  the  oxidized  A-chain  is  largely  unfolded  in 
aqueous  solution  (26)  and  the  A-chain  would  exist  as  an  a-helix  in 

"  From  26  matched  with  19  via  solution  (24)  ^  ^  diversity  of  factors  (3j  -1 
denaturation  (4)  and  A-chain  (26)       ,  insulin  (25)       ,  protein. 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


33 


an  a-helic  form 

of  the  A-chain 

would 
exist  in 

solution 

a  model-substance 

of  the  A-chain 

would 

the  study  of 

-of-known  struc- 

serve in 

denaturation 

ture  capacity 

solution  suggests  that  similar  grammatical  changes  be  obtained 
here.   We  use  a  restricted  transformation 

/Vi  r,  as  N2  -^  /Vi  Ki  as  /V.,  V^  -^  N^  as  N^  V^  -*  N^  as'  N,  V 
where  as^  is  chiefly  A''  of(iice  8a  above)  e.g.  form  of,  capacity  of, 
and  V  is  V^  or  a  verb  phrase  related  to  V^. 


and 


2.  Consequently  /  single   measurements   of  optical    rotation    / 

do  not  give  an  adequate  description  of  /  rotatory  properties  /  , 
Vi  N,  C 

though  /  they  /  have  often  proved  very  useful  in  /  the  character- 

ization  /  of  /  proteins  /  and  /  the  detection  /  of  /  changes  /  in  / 

Structure  /  in  /  solution. 

M  =  Vn  in  N^  in  N^ 
S,^  =  Ny_  V^in  N^ofNi 
53  =  A^i  V^inN.ofM 

5"!  is  unusual  in  containing  two  occurrences  of  r.  They  are  not 
separable  by  transformation,  differently  from  23,  S2  contains  r 
and  H  but  in  a  grammatical  relation  that  is  not  a  transform  of  the 
relation  between  these  in  our  periods.  ^3  contains  R  and  k,  again 
in  a  different  grammatical  relation.  But  one  phrase  within  it,  M, 
can  be  matched  with  the  second  set  of  periods  by  the  restricted 
transformation  V^n  in  No  <-  A'^2  ^1 

structure  |  changes  in  |  solution 

Similarity  to  the  periods  above  suggests  that  here  as  in  20  we  could 


34 


A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 


assume  o/h  (and  in  particular  of  proteins  from  the  corresponding 
7V4)  after  structure:  but  this  is  not  derivable. 

From  19,  20,  26,  22,  2  we  have  obtained  periods  containing,  e.g., 


unfolding  of  .  . .  structure 
folded  forms  unstable 
being  unfolded 
a- helix 
structure 


of  H 


results  from 

are  made  by 

is  in 

exist  in 

serve  in 

change  in 


Not  all  the  entries  in  column  1,  or  in  column  3,  have  been  shown 
equivalent.  If  we  assume  is  in,  exist  in,  are  made  by,  change  in,  to 
be  in  the  same  equivalence  class,  all  in  column  3  would  be  equiva- 
lent; thence  all  of  column  1  would  be  equivalent;  thence  adding 
o/h  would  be  derivable  for  2  (though  it  would  have  to  be  assumed 
for  20  to  obtain  assignment  of  repulsion  to  k).  Then  if  we  assume 
change  =  undergo  changes  (25),  column  3  is  assigned  to  w,  whence 
column  1  becomes  equivalent  to  r.  Only  these  assumptions  then, 
are  needed  to  give  this  second  set  of  periods  the  same  r  o/h  w  K 
structure  as  the  first. 

We  now  consider  the  remaining  sentences  which  contain  the 
word  structure  or  members  of  k  (8,9,12).  It  will  be  seen  that  they 
all  contain  ^c  or  A,  two  items  from  the  equation  of  sentence  6. 


18.  The  entries  in  Table  I  are  too  few  to  permit  any  certain 
generalizations  but  the  indication  is  that  /  Xc  /  provides  a  measure 

of  the  presence  of  /  secondary  structure  /  in  /  polypeptides  /  ,  / 

having  /  values  /  above  2400  A  /  for  /  the  ordered  configura- 


A^. 


TV. 


Na 


r  2  ti  i  1  J»6  -"6 

tions  /  of  /  native  proteins  /  ,  and  /  values  /  less  than  2300  A  / 

A^7  C  N^  P  iVg 

for  /  polypeptides  /  in  /  disordered  states. 
A^3  N, 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT 


35 


S^  =  Ac  Vi  N.,  in  ^3 

S^  =  ?ic  V,N,PNJbr  N.ofN, 

Si  =  Ac  V^N^P  NJor  N,,inN^ 

Sx  =  The  . .  .  indication  is  that  S2,  S3,  and  S^  (or :  The  indication 

is  the  following) 

Since  A^2  is  in  R  and  A^,  in  11,  wc  would  like  to  move  these  to  subject 
position,  by  the  restricted  passive  of  verb  phrases: 


secondary 

in  polypeptides 

is  provided  a 

structure 

measure  of  presence 

by  Xc 

ordered 

of  native 

yield  values 

of  Xc 

configurations 

polypeptides 

ordered 

of  native 

yield  values 

above 

configurations 

polypeptides 

2400  A 

disordered 

of  polypeptides 

yield  values 

of  Xc 

states 

disordered 

of  polypeptides 

yield  values 

less  than 

states 

2300  A 

The  passive  of  Ac  has  values  P  N^for  N^  was  N^  yields  values  P  N^ 
of  Ic-  Since  in  14  (the  Table)  numbers  in  A  (such  as  N^  and  A^^ 
here)  occur  in  the  column  of  Xc,  we  obtain  this  arrangement  here 
by  . . .  V  P  NiP  Nj  -^  ...  V  P  Nf.  ...  V  P  Nf.  . . .  values  of  Xc; 
. .  .  values  above  2400  A.  \x\  S^  we  found  polypeptides  in  the  first 
instead  of  the  second  position  of  the  N  P  N  phrase;  we  aligned  ^4 
with  the  others  by  the  restricted  transformation  which  inverts 
noun  order:  N^  P^  ^3  -*■  ^3  Pi  ^i  (where  Pt  is  usually  different 
from  7*2).  The  first  two  columns  are  then  R  6»/h.  The  third  column 
has  some  similarity  to  w  {is  measured  under)  in  sentence  1,  but  this 
can  not  be  derived  transformationally. 


12.  Xc  /  is  not  dependent  on  /  temperature  /  within  experimental 

N,    V,  N,  (to  V,) 

error  (approx.  ±  50  A). 

Ki  is  a  member  of  w  (with  adjunct) ;  N^is  a.  member  of  K. 


36  A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 

9.  Xc  /  ,  on  the  other  hand,  /  varies  only  as  a  result  of  /  drastic 

C  Ki  N^ 

changes  in  the  protein  system  / ,  in  particular  /  denaturation  /  by  / 

C  A^3 

urea  /  or  /  guanidine  /  or  /  titration  to  a  pH  between  10.5  and  12.0. 
N^      C    N^  C     N^ 

S^  =  I,  Ki  N^ 

S^  =  Ac  Vi  N3  by  Ni  (C  involves  removal  of  only) 
53  =  Ac  Ki  A^3  by  N, 
S,  =  Xc  Fi  A^6 

In  the  case  of  the  first  or,  the  simple  noun  N^  following  it  was  con- 
sidered parallel  to  the  simple  noun  7V4  preceding  or,  with  zero 
recurrence  of  the  rest  of  the  sentence  (Ac  Vi  N^  by) ;  for  the  second 
or,  the  F«-phrase  N^  following  it  was  considered  parallel  to  the 
Fn-phrase  A'^g  by  N^  {or  N^)  which  precedes  this  or,  with  zero 
recurrence  of  the  rest  of  the  sentence  (Ac  V^).  Internal  structure  of 
material  on  one  side  of  C  often  but  not  always  indicates  what 
material  parallels  it  on  the  other  side.  Since  N^  (with  its  adjuncts) 
is  a  member  of  k,  so  are  N2  and  N^.  Hence  9  is  Xc  Kj  k,  while  12 
is  Xc  w  K,  whence  V^  is  a  member  of  w. 

8b.  Sgn  (see  8a  above)  results  from  the  fact  that  /  A  /  is  in  general 

a  function  of  /  temperature  /  ,  /  pH  /  ,  /  ionic  strength  /  ,  /  dena- 

N,  N,    C    N,  C  N, 

turation  /  ,  etc. 

S,  =  N,  V,  7V3 
S,  =  N,  V,  N, 

Si  =  S'gAi  results  from  the  fact  that  S3,  S^,  S^  S^,  etc. 
N2,  N3,  A^6,  and  hence  also  A^4  are  in  k.  Vi  is  new  but  semantically 
close  to  w.''^ 

"  Wc  could  derive  A  ,  Ac  (and  hence  K,  -~^  w)  from  7.  However,  it  is  not 
safe  to  rely  on  environments  such  as  in  7,  totally  unrelated  to  the  environments 
in  respect  to  which  our  equivalence  classes  have  been  set  up.   These  may  be 


A  TECHNICAL  ARTICLE!  NORMALIZATION  OF  A  TEXT  37 

6.  In  .  .  .  that  /  all  the  polypeptide  systems  which  were  investiga- 

^1                                                                       A 
ted  /  obeyed  /  a  one  term  Drude  equation  /  ...  [a]  = 

where  /  [a]  /  is  /  the  specific  rotation  /  and  /  X  /  the  wavelength 

of  the  measurement. 

S3  =  Ni  Vi  the  equation 

5*4  =  [a]  is  7V3 

55  =  ^  is  Ni 

Si  =  In  . . .  that  S2,  S3  where  S4  and  S^ 

In  5*3,  [a]  is  r  (by  ^4)  and  X  is  K  (by  S^),  while  A^i  is  H  (if  we  take 

polypeptide  systems  =  polypeptides).  Then  we  have 

H  obeyed  N2 

H  obeyed  r  =  A  -^  (-  Xc^  +  k^) 

We  find  that  we  can  relate  all  the  preceding  sets  of  periods  to  this. 

The  first  two  sets  reduced  to  r  ofn  w  k.  By  applying  the  restricted 

^1  of  N2  ->  A^2  f"'^s  ^1  (see  under  23)  we  obtain  a  form  closer  to  that 

of  the  equation : 

H  V  R  wa  K 

(e.g.,  proteins  have  rotatory  properties  dependent  on  wavelength  for 

sentence  5 ;  where  v  indicates  the  verb  connecting  h  to  k)}^ 

Then  18  has  the  form 

H  V  R  ua  Xc 

and  12,  9  have 

Xc  w  K 

while  8b  has  (if  we  assume  its  V^  =  w) 

A  w  K 


found  to  constitute  different  structures,  with  different  classes.  Similarly,  the 
equivalence  of  [a]  20  and  Ac  in  10  is  of  doubtful  relevance,  and  is  destroyed  by 
the  fact  that  these  have  different  positions  in  the  analysis  of  the  formula  in  1 1, 
and  in  the  table  (14).  Finally  Ac  in  12  is  equivalent  to  R  o/h  in  5,  8,  15;  but  [a] 
is  R(in  6),  and  [a]  and  Ac  have  different  positions  in  the  analysis  of  the  formula 
in  6. 

"  Ni  Vi  a  N2  Vi  N3  -^  Ni  Vi  IV2  Vi  a  N3.  When  two  sentences  are  combined, 
either  the  first  verb  or  the  second  is  adjectivized,  as  in  fn.  5.  In  summarizing 
H  v  R,  etc.  we  will  omit  the  a  which  indicates  that  all  but  one  of  the  verbs  must 
be  adjectivized. 


38 


A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 


We  can  express  the  similarity  of  these  sentences  to  the  formula  by 
saying  that  =  is  in  u,  while  the  operations  between  A,  Xc  and  X 
are  in  w,  so  that  the  formula  period  is  (putting  l  for  Xc  and  A 

H  V  R  U  L  W  K 

11.  A  20 1  may  be  obtained  from  /  [afj?  /  by  means  of  the  rela- 
V  C 

tion/A2o  =  [a]S(XD->^') 

S,  =  A20  V,  [a]J,° 
The  equation  S2  is  again  r  u  l  w  k  rearranged. 
A  20  may  be  obtained  from  [a]^  is  the  passive  of  R  yields  L. 

14.  The  table  lists  substances  such  as  insulin,  values  of  [aj^, 
values  of  Xc,  and  conditions  such  as  pH.  Hence  every  line  in  the 
table  can  be  analyzed  as 
H  has  value  of  r  and  has  value  of  l  under  k;  i.e.  h  v  r  u  l  w  k." 

There  remain  several  sentences  which  contain  references  to  the 
table,  in  particular  to  Groups  A,  B  in  it  (footnote  17). 

10.  The  values  /  of  /  Xc  /  and  /  [ajg*  /  are  given  in  Table  I  for  / 

a  number  of  proteins  /  and  /  related  substances. 

5j  =  N^ofXc  VJorN. 
Si  =  N^ofXcV.forN^ 
S^  =  N^of{af^V,forN2 
S,  =  N,of[af^  V.forN, 
These  can  all  be  transformed  into  the  active : 


a  number  of 

are  given  in  Table 

(ocff 

proteins 

I  values  of 

and 

a  number  of 

are  given  in  Table 

M^o° 

related 

I  values  of 

substances 

whic 

ri  is 

H  V  R  U  L 

and 


and 


values 


of 


values 


of 


Xc 


"     The  subdivision  of  the  tabic  into  groups  A  and  B  can  be  treated  as  names 
of  the  two  sets  of  entries.   A  name  is  a  member  of  the  column  which  it  names 


A  TECHNICAL  ARTICLE:  NORMALIZATION  OF  A  TEXT  39 

17.  The  table  has  been  divided  into  two  groups  to  emphasize  an 
obvious  pattern,  namely,  /  all  the  substances  /  in  Group  A  /  are  / 

native  globular  proteins  /  ,  /  all  those  /  in  Group  B  /  are  not. 

Si  =  A^i  are  P  N^ 

S3  =  A^i  are  7V3  {S-^  connected  to  S2) 

Si  =  Ni  are  P  N^ 

5*5  =  Ni  are  not  N.^  {S^  connected  to  S^^ 

Si  =  The  .  .  .  pattern,  namely,  S2  .  .  .  S^ 

S3  indicates  the  substitutability  of  A'^g  for  N^  in  ^2,  and  S^  substi- 
tutes not  A3  for  Ni  in  ^'4.  Hence  we  are  left  with: 
6" 2  =  A^3  are  P  N^ 
S\  =  not  A3  are  P  A4 
Since  native  occurs  both  here  and  in  18,  we  try  to  extract  it,  leaving 
protein  as  subject. 

A  N  -^  N  is  A 
Ni  is  A;  Ai  is  P  N  -^  N^  which  is  A  is  P  N 


proteins 
proteins 


which  are 
which  are 


native  globular 
not  native  globular 


are  all  in 
are  all  in 


Group  A 
Group  B 


When  R  o/h  is  transformed  to  H  has  R  or  the  like  (to  satisfy  6,  14), 

native  will  be  extracted  from  18  as  follows: 

A^i  P  A2  -*  N2  has  Ni  (applied  to  all  R  of  h)  yields  native  proteins 

have  ordered  configurations.   Then  A  Ag  ->  A2  is  A  yields  proteins 

are  native.   And  Ag  is  A;  Ag  has  A^  ->■  A^  which  are  A   and  have 

Ni :  proteins  |  which  are  |  native  and  have  ordered  configurations  \ 

yield  values  |  above  24(X)  A 

Since  18  is  thus  h  v  r  u  l,  so  is  17;  and  Group  A,  5  is  in  l, 

20a.  Clupein  /  naturally  falls  /  into  the  latter  group.  .  . 
Ai  Ki  P  yV, 

Since  latter  is  pro-adjective  of  the  last  adjective  of  group,  it  is  a 

(i.e.  A^t  is  Nj  means  that  Ni  and  Nj  are  in  the  same  column,  as  in  6  above). 
Hence  Group  A  and  Group  B  are  members  of  the  columns  of  values  [a]  20^  and 
Ac. 


40  A  TECHNICAL  ARTICLE :  NORMALIZATION  OF  A  TEXT 

recurrence  of  B  (in  17).  Hence  P  N2  here  =  P  N2in  2\.  Since  A'^i 
here  =1  A'^i  below,  V^  here  =2  ^^i  below. 

21.  The  presence  of  /  the  oxidized  A-chain  of  insulin  /  in  Group  B  / 
V^n  A^i  P  N^ 

was  not  expected. 

^2  -N^V^P  #2 

^i  =  S^n  ^\as  not  expected 
As  in  5,  8a : 

the  .  .  .  A-chain  ...  |  is  present  in  |  Group  B 

The  only  sections  of  the  text  that  remain,  grammatically  separ- 
ated from  the  periods  given  above,  turn  out  to  have  for  the  most 
part  words  entirely  different  from  those  of  our  characteristic 
columns.  There  is  too  httle  word  recurrence  in  these  sections  to 
yield  a  detailed  discourse  structure,  but  in  any  case  they  constitute 
a  separate  set  of  periods.  These  periods  m.ay  be  very  important 
to  the  main  body.  For  example,  S  22a  indicates  that  S  22b  was  a 
hope  but  not  a  result.  Since  these  new  periods  often  include 
periods  of  the  main  body  as  noun  phrases  within  them,  we  may  call 
the  new  set  a  metadiscourse  of  the  main  body,  the  relation  of 
metalanguage  to  object  language  being  in  part  that  words  or  senten- 
ces of  the  object  language  are  named  in  noun  phrases  of  the  meta- 
language.  Examples  are  {this  indicates  included  object  period): 

S  3a :     This  is  in  many  ways  an  advantage. 

S  3c :     The  variation  of  each  parameter  yields  a  separate  source  of 

experimental  information. 
S  4a :     One  of  the  authors  has  for  some  time  been  engaged  in  this 

study. 
S  5a,  c:  One  phase  of  this  research  is  recorded  here. 
S  5c:      It  is  of  special  importance  to  the  problem  at  hand. 
S  6a:     This  agrees  with  the  observations  of  Hewitt. 
S6a:     This  was  found. 
S6a:     These  were  investigated. 
S21a:    This  was  not  expected. 
S  22a:    In  fact  this  was  hoped. 
S  23a:    Instead,  this  was  found. 


A  TECHNICAL  ARTICLI-:  NORMALIZATION  OF  A  TEXT  41 

The  following  is  a  list  of  transformations  used  in  this  paper. 
Restricted  ones  are  marked  *.  The  numerals  following  the  trans- 
formations indicate  sentences  of  the  discourse  to  which  the  trans- 
formations were  applied. 

1.  Pro-morphemes  (including  zero)   ->■   the  words  whose  gram- 
matical recurrence  they  constitute. 

2.  Conjunction: 

S  C  S  *-^  S;  C  S  (see  footnote  3;  obtaining  the  S  may  require 
filling  in  zeros)  Except  for  Nj,  St  -*  St;  but  not  Si  (Nj)  -  15. 

3.  Word-sharing  (see  footnote  5) 

Ni  Vji  Ni  Vk  -^  NiVja  Vk  or  Ni  Vj  Vka  -  6  (depending  on  which 

V  is  taken  as  center) 

Si  (A  Nj)  ^  Si  (Nj):  NjAv-4 

Ni  ofNj  wh  Ni  Vk  ^  Ni  ofNj  Vk  -  23 

4.  Sentence  inclusion 

Si  (Sjn  or  that  Sj)  ->  St  (pro-Sjn);  Sj  -  5,  8,  24 

5.  S*->  Sn 

Ni  Vj^  Vjing  of  Ni- 26 
*Ni  Vj  ^  VjU  in  Ni-1 
Ni  VhNj-*  NjofNi- 23,  6 
Ni  Pj  Nk  ^  Ni  is  Pj  Nk  -  26 
Ai  Nj  -^  Nj  is  At  -  17 

6.  A^->  A^ 

*NkPj  Nk^  NkPiNi-  18 
*Via  N'  ->  Vin-S 
1.  S  -^  S 

Ni  Vj  i*Pk)  7Vi  ^  A^i  Vj-^  {*Pk)  Ni  -  Passive;  V'^  is  be  V en  by. 
*Ni  Vj  Pi  Nk  ^  Nk  ofNi  Vj  -  25 
*Ni  Ve  Nj  ->  Ni  Ve  Nx;  Nj  Ve  Nx  -  23 
*Ni  Vj  as  Nk  -^  Nk  as-^  Ni  Vj  -  22 


III.  DISCOURSE  ANALYSIS  OF  A  TECHNICAL  ARTICLE: 
REDUCTION  TO  TABLES 


Summary :  By  means  of  (formal)  grammatical  transformations,  a 
text  is  reduced  to  a  sequence  of  maximally-similar  sentence  struc- 
tures called  periods  (Table  1).  These  are  rearranged,  to  the  extent 
necessary  to  bring  similar  periods  together  (Table  2);  the  reordering 
and  the  coalescing  of  some  of  the  similar  periods  may  require 
knowledge  of  the  less  specialized  word  meanings  in  the  text.  Since 
each  period  is  a  sequence  of  equivalence  classes  (with  possibly  some 
added  material),  a  set  of  similar  periods  can  be  further  reduced 
(partially  with  the  aid  of  some  semantic  information)  by  just 
showing  how  the  entries  in  each  class  vary  in  respect  to  each  other, 
i.e.  how  they  correlate  (Tables  4,  5).  This  tabulation  yields  not 
only  an  unredundant  summary  of  the  facts  stated  and  the  arguments 
made,  but  also  the  possibility  of  inspecting  these  for  organization, 
completeness,  etc. 

We  present  here  the  discourse  analysis  of  the  complete  brief  article 
given  in  Section  11.  The  sentences  of  the  original  were  transformed 
into  periods  that  would  be  most  similar  to  other  periods  obtainable 
from  other  sentences  of  the  same  article.  "Similar"  here  means : 
containing  the  same  words  in  the  same  grammatical  position 
within  the  periods;  or  where  the  words  are  not  the  same,  providing 
maximal  opportunity  to  put  words  which  are  in  the  same  gram- 
matical position  into  an  equivalence  class.  Two  words  or  phrases 
a,  h  are  equivalent  by  the  following  discourse  operations,  sum- 
marized from  Section  1 : 

a  =o  />  if  fl  is  the  same  word  as  />,  and  occupies  the  same  gram- 
malical  position  in  a  period. 


A  TECHNICAL  ARTICLE :  REDUCTION  TO  TABLES  43 

a  =n  b  if  env  a  =n-i  env  h  (env  x,  the  environment  of  x,  is  all  the 

rest  of  the  period  in  which  x  appears,  except  for  connectors 

and  introducers  in  most  cases). 

a  ^---  b    if  there  is  a  sentence  a  is  b  or  its  transform  in  the  discourse. 

a  =  flj  if  ^  is  a  grammatical  modifier  of  a  and  does  not  recur 

otherwise  in  the  discourse. 
a  may  be  set  =  ft  by  assumption;  this  is  done  in  order  to  complete 
the  analysis,  and  it  is  preferable  to  assume  equivalences 
which  are  least  special  or  crucial  to  the  discourse. 
When  the  successive  sentences  have  been  transformed  into  a 
sequence  of  periods  (among  which  there  is  the  maximum  amount 
of  similarity)  we  have  the  optimal  transform  of  the  discourse. 
Table  1  presents  this  optimal  transform,  which  we  may  read  through 
as  a  roughly  equivalent  paraphrase  of  the  original  text.  Each 
successive  line  (row)  in  the  table  is  a  period.  Each  column  is  an 
equivalence  class,  i.e.  every  member  of  a  column  is  related  to  every 
other  member  of  the  column  by  one  of  the  equivalences  listed  above. 
The  column  arrangement  makes  it  easy  to  exhibit  the  similarity 
among  periods.  Inspection  shows  what  types  of  period  occur 
where,  what  changes  there  are  within  periods,  how  the  periods  can 
be  grouped  into  a  summary,  and  what  critique  one  can  make  of 
the  text  on  the  basis  of  the  structure  and  arrangement  of  the  periods 
by  showing  what  columns  are  filled  in  for  which  periods,  what 
changes  there  arc  within  columns,  etc. 


1.     TABLE  OF  OPTIMAL  PERIODS 

The  transformations  in  Section  11  showed  that  the  text  contains  a 
great  many  periods  constructed  out  of  the  equivalence-class  se- 
quence H  V  R  u  L  w  K  (or  H  R  L  K,  sincc  the  others  are  connecting 
verbs),  with  various  omissions.  Interspersed  among  these  are  other 
periods,  given  in  the  Metadiscourse  column  below,  which  contain 
other  words.  About  half  of  these  contain  a  pronoun  of  a  neigh- 
boring H  R  L  K  period  which  is  here  marked  by  its  address.  Most 
of  the  other  metadiscourse  periods  contain  a  phrase  from  the 


5 

P 

C 

H 

V 

R 

u 

T 

1 

Insulin 

has 

structure 

2 

Sanger's  A-chain 

has 

structure 

I 

3 

Proteins 

have 

optical  rotatory  power 

4 

wh 

Proteins 

have 

optical  rotatory  power 

5 

particularly 

Proteins 

have 

optical  rotatory  power 

2 

6 

7 

8 
9 

Consequently 

though 

and 

rotatory  properties 
structure 

3 

10 
11 

12 

In  many  ways 
for 

Proteins 

have 

optical  rotation 

4 

13 

several  proteins 

have 

rotatory  properties 

14 

several  proteins 

have 

rotatory  properties 

15 

several  proteins 

have 

rotatory  properties 

16 

several  proteins 

have 

rotatory  properties 

17 

several  proteins 

have 

rotatory  properties 

5 

18 
19 

20 

because 

proteins 

have 

rotatory  properties 

6 

21 

22 

the  polypeptide  syst. 

obeyed 
within  exper- 
imental error 

a  - 

one- 

23 

the  polypeptide  syst. 

obeyed 
within  usu- 
ally less  than 
1% 

a  - 

one- 

24 

the  polypeptide  syst. 

obeyed 
within  usu- 
ally less  than 
1% 

[a]                 1 

the  specific  rotation  J 

25 

all  wh 

the  polypeptide  syst. 

7 

26 
27 

but 

v 

w 

K 

MctadisLourse 

S 

compared  with  2 
compared  with  1 

T 

very  sensitive  to 
measured  under 
very  sensitive  to 

the  experimental  conditions 

the  experimental  conditions 

the  wavelength  ofthc  light  which 

is  used 

1 

changing  in 

solution 

are  not  given  adequate  descrip- 
tion by  single  measurements  of 
optical  rotation 

the  latter  has  often  proved  very 
useful  in  the  characterization  of 
proteins 
...in  the  detection  of  9 

2 

affected  by 

a  diversity  of  factors 
a  diversity  of  factors 

is  an  advantage 

the  variation  of  each  parameter 
yields  a  separate  source  of  ex- 
perimental information 

3 

including  the  effect  of 
including  the  effect  of 
including  the  effect  of 
including  the  effect  of 

temperature 
wavelength 
pH 
the  denaturation  reaction 

One  of  the  authors  (J.A.S.)  has 
for  some  time  been  engaged  in 
a  study  of... 

4 

depending  on 

wavelength 

One  phase,  18,  of  this  research 
is  recorded  here 

it  is  of  special  importance  to  the 
problem  at  hand 

5 

iterm- 
iterm- 

Drude- 
Drude- 

+ 

equation 

equation 

A«)'                   1 

the  wavelength  of  the  > 

measurement              J 

22-4  was   found,   in   agreement 
with  the  observations  of  Hewitt 

were  investigated 

6 

Ac 

Ac 

have  a  certain  amount  of  theore- 
tical significance 
will  be  regarded  here  as  empiri- 
cally determined  properties 

7 

TABLE  1  (continued) 

S 

P 

C 

H 

V 

R 

U       ' 

8 

28 
29 
30 
31 
32 

results  from      1 
fact  in  general  J 

(proteins 

have) 

optical  rotation 

9 

33 

34 
35 
36 

on  the  other  hand 

in  particular 
or 
or 

10 

37 

a  number  of  proteins 

have  values  of 

[a]'^ 

and  value: 

38 

and 

a  number  of  related 
substances 

have  values  of 

[a?^ 

and  valueft 

11 

39 

[a]i« 

from" 

40 

by  means  of 
relation 

Mi? 

12 

41 

42 

13 

43 

gelatin 

14 

44a 

(The  substance... 

has  value 

of  -[a]20 

and  val 

ff. 

Insulin 

has  value 

— 

and  val 

— 

has  value 

— 

and  val 

— 

has  value 

— 

and  val 

— 

has  value 

— 

and  val 

— 

has  value 

— 

and  val 

3-lactoglobulin 

has  value 

— 

and  val 

— 

has  value 

— 

and  val 

— 

has  value 

— 

and  val: 

— 

has  value 

— 

and  vail 

— 

has  value 

— 

and  vail 

15 

45 

(proteins 

for-') 

the  specific  rotations 

46 

but 

insulin 

for' 

the  specific  rotations 

16 

47 

the  substances 

have  val 

17 

48 

49 

r  substances 
1     proteins 

native  globular,  all 

are 

50 

J  substances 
1  proteins 

not  native  globular, 
all 

arc 

f 

vv 

K 

Mc'tadisiourse 

S 

dependent  on 
is  a  function  of 
is  a  function  of 
is  a  function  of 
is  a  funct'on  of 

experimental  conditions 
temperature 

PH 

ionic  strength 

denaturation,  etc. 

8 

Ac 

Ac 
Ac 

varies  only  as  a  rcjult  of 

varies  as  a  result  of 
varies  as  a  result  of 
varies  as  a  result  of 

drastic  changes  in  the  protein 

system 

denaturation  by  urea 

denaturation  by  guanidine 

titration  to  a  pH  between 

10.5  and  12.0 

9 

line 
!'jes 

Ac 
Ac 

given  in  Table  I 
given  in  Table  1 

10 

cm' 

(-A? 

+ 

A^)- 

may  be  obtained 

11 

Ac 
Ac 

is  not  dependent  within 

experimental  error  on 

is  not  dependent  within 

approximately  ±  50A  on 

temperature 
temperature 

12 

has  results  taken  from  Carpenter 
and  Lovelace 

13 

vail 
vail 
'all 
all 
vail 
all 
all 
all 
all 
all 
all 

Ac 

(Ggoup 
A 

<  Group 
B 

under 
under 
under 
under 
under 
under 
under 
under 
under 
under 
under 

conditions) 
pH  3,2% 

denaturation  by  7M  urea 

(is)  Table  I 

14 

were  independent  of 
were  not  independent  of 

moderate  changes  in  protein 

concentration 
moderate  changes  in  protein 

concentration 

15 

all 

fAc 

listed  in  descending  order  in 
Table  I 

16 

roup  A 
roup  B 

The  table  has  been  divided  into 
two  groups  to  emphasize  an 
obvious  pattern,  49-50 

17 

TABLE  1  (continued) 

S 

P 

C 

H 

V 

R 

u 

18 

51 
52 

but 

53 

polypeptides 

have 

secondary  structure 

whose  pre 
is  provid 
measure 

54 

proteins 

native  with  ordered 
configurations 

have  va 

55 

polypeptides 

in 

disordered  states 

have  va; 

19 

56 

57 

a  protein 

has 

unfoldingof  the  hydro- 
gen bonded  structure 

i 

58 

a  protein 

has 

unfoldingof  the  hydro- 
gen bonded  structure 

20 

59 

clupein 

natural! 
lulls  in 

60 

because 

(clupein 

has) 

unstable  folded 
molecular  forms 

21 

61 
62 

the  oxidized  A-chain 
of  insulin 

is  presen 

22 

63 

In  fact 

64 

the  A-chain 

as 

an  a-hclix 

65 

and  therefore 

the  A-chain 
as  model  substance 

of 

known  structure 

23 

66 

Instead 

67 

the  A-chain 

possesses 

rotatory  properties 

68 

clupein 

possesses 

rotatory  properties 

69 

but 

insulin  itself 

possesses 

rotatory  properties 

24 

70 

71 

clupein 

has 

specific  rotations 

72 

and 

clupein 

has 

specific  rotations 

73 

and 

the  A-chain 

has 

specific  rotations 

74 

and 

the  A-chain 

has 

specific  rotations 

25 

75 

ordinary  proteins 

have 

specific  rotations 

76 

including 

insulin 

has 

specific  rotations 

26 

77 

78 

the  oxidized  A-chain 

is 

largely  unfolded 

79 

and 

80 

the  A-chain 

has 

peptide  hydr.  atoms 

81 

whereas 

insulin 

has 

peptide  hydr.  atoms 

27 

82 

Meladiscourse 


'Ic 
iO  A 
oO  A 


The  entries  in  Table  I  are  too 

few  to  permit  any  certain 

generalizations 

the  indication  is  53-5 


We  are  here  subscribing  to  the 
view  57-8 


19 


led-to     by 
led-to     by 


heat 
chemical  denaturation 


IP  B 


made     by 


internal  repulsion  due  to 
its  high  positive  charge 


20 


,pB 


21 


61  was  not  expected 


would  exist  in 
would  serve  in 


solution 
the  study  of  denaturation 


64-5  was  hoped 


22 


67-9  was  found 
resembling  X  very  closely 
resembling  X  very  closely 
not  resembling  X 


23 


The  fact  71-4  is  most  striking 


virtually  unaffected  by 
virtually  unaffected  by 

virtually  unaffected  by 
virtually  unaffected  by 


strong  solution  of  urea 

strong  solution  of  guanidinc 

chloride 

strong  solution  of  urea 

strong  solution  of  guanidine 

chloride 


24 


undergoing  changes  of 
100%  to  300 "o  under 

undergoing  changes  of 
100%  to  300%  under 


these  conditions 
these  conditions 


25 


exchanging  readily  with 

not  exchanging  readily 

with 


aqueous  solution 

D,0 

D«0 


Results  71-6  suggest  78 

results  71-6  are  in  agreement 
with  recent  finding  80-1 


26 


A  detailed  report  of  this  work 
will  appear  later 


27 


50  A  TECHNICAL  ARTICLE :  REDUCTION  TO  TABLES 

H  R  L  K  columns;  a  few  are  entirely  independent.  All  these  do  not 
run  together  into  a  connected  metadiscourse  (i.e.  a  text  about  the 
H  R  L  K  text),  but  constitute  individual  statements  about  individual 
parts  of  the  H  R  L  K  text. 

In  Table  1  above:  Parentheses,  except  in  the  equation,  indicate 
material  added.  A  curved  bracket  (as  in  periods  24, 44, 49)  indicates 
a  is  b;  the  superscript  -1  indicates  reverse  order:  a  b-^  c  =  cba.  In 
some  cases  different  applicable  transformations  would  have  yielded 
a  different  division  between  h  and  r,  or  u  and  l,  etc.;  but  this 
would  not  affect  the  result.  The  column  C  contains  introducers  of 
a  period  and  connectors  between  periods  P.  T  indicates  Title. 

2.  COMPACT  FORM  OF  THE  TABLE  OF  PERIODS 

In  general,  the  various  entries  in  a  column  are  not  simply  substi- 
tutable  for  each  other;  they  are  equivalent  only  in  the  definition  of 
Section  I.  The  equivalences  which  yielded  these  periods  only 
enable  us  to  compare  periods,  not  to  equate  them.  However,  we 
can  try  to  combine  periods  into  summary  periods.  For  example, 
if  periods  49-50  vary  r  with  l,  with  h  constant,  we  can  obtain  a 
single  statement  of  the  correlation,  or  if  periods  67-9  kept  r  k 
constant  while  varying  h  and  w,  we  could  summarize  the  variation 
of  H  in  respect  to  w.  In  this  way  we  may  reduce  certain  sets  of 
periods  to  a  covariance  of  columns.  Other  sets  of  periods  may  be 
summarized  more  simply.  These  are  the  ones  in  which  the  same 
words  appear  in  various  rows  in  a  column,  occurring  with  different 
members  of  the  neighboring  columns.  If  there  does  not  seem  to  be 
any  reasonable  correlation  (e.g.  between  the  various  w  and  K  in 
periods  1-3,  10,  14-18,  28-32),  and  if  the  different  members  of  a 
column  X  seem  to  have  no  difference  in  meaning  relevant  to  the 
text,  we  can  combine  the  members  of  X  as  textual  synonyms. 

It  will  be  clear  that  this  part  of  the  work  is  at  present  still  tenta- 
tive. AH  we  know  is  that  the  formally  obtainable  sequence  of 
periods  is  still  redundant:  summaries  and  correlations  can  be 
extracted  from  it.  The  paragraph  above  is  an  example  of  the 
considerations.    Any  attempt   to   reduce   our   material   requires 


A  TECHNICAL  ARTICLE:  REDUCTION  TO  TABLES 


51 


comparison  of  similar  periods.  In  Table  1  above  we  find  periods 
of  the  following  structures:  H  R  K,  H  R  L  K,  L  K,  H  R  L,  appa- 
rently in  no  particular  order.  A  few  interchanges  in  order  of 
periods  give  us  five  successive  sets  of  identically-structured  periods, 
each  of  which  can  be  compactly  summarized.  (Any  connectors 
between  periods  will  be  noted  even  after  the  shift;  and  the  shifted 
periods  which  are  not  tied  by  connectors  seem  to  fit  semantically 
in  the  new  position  as  well  as  in  the  old.)  Table  2  (pp.  52-53), 
presents  these  five  sets  of  slightly  rearranged  periods,  with  similar 
periods  combined.  Metadiscourse  periods  which  contain  (pro- 
nouns of)  particular  h  r  l  K  periods  have  been  transformed  into 
introducers  or  connectors  of  those  periods.  The  remaining  meta- 
discourse periods  are  included  in  parentheses  or  omitted  (periods 
1-2,  6-8,  11-12,  13,  20,  21,  25,  39,  43,  82)  as  incidental  comments. 
Some  adjunct  phrases  are  also  omitted  from  the  periods  here. 

3.  REDUCTION  OF  THE  COMPACT  TABLE 


Table  2  (pp.  52-53),  then,  fits  the  main  metadiscourse  periods 
into  the  C  column,  and  rearranges  the  regular  periods  into  five 
successive  sets  h  r  k,  l  K,  h  r  l  k,  h  r  l,  h  r  k.  Within  each 
column  the  following  development  maybe  noted  (this  was  roughly 
visible  in  Table  1  also): 

TABLE  3 


H 

R 

L 

vv 

K 

proteins 
polypeotides 

1 

rotations 

depend  on 

wave! 
et 

engtii 

c. 

A,   Ar 
1 

1 

1 

^ 

Y 

eroup  A,  B 

under 

- 

striicti 

irc,  etc. 

1 

A-chain, 

clupeiii  not 
insulin 

change  in 

exist  in 

etc. 

solution,  etc. 

feriod 

C 

Period 

H 

R 

19 

recorded  here: 

5,  15,  18 

14 

16 

17 

3-4,  10, 

28 

Proteins 

,         optical  1 
h^^Specificr°^ 

45-6 

proteins  not  insulin 

.        optical  1      ^ 
'^^^Specificj^^^^ 

26-7 

29 

30 

32 

31 

32 

33 

36 

34-5 

41-2 

21 

found : 

24 

polypeptides 

obev  [a] 

~ 

40 

[a]i? 

" 

37-8 

proteins 

have  value  of  [a]j 

c 

44 

proteins 

have . . .  value  of  -[ 

:(■/ 

;  ■ 

47 

proteins 

have. . .  value  of  -|- 

51/ 

48 

44  shows  pattern 

[49 

proteins 

native  globular, ; 

50 

proteins 

not  native  globulai 

ai. 

52 

44  indicates 

53 

polypeptides 

have  secondary  stru 

lii 

54 

proteins 

native  ordered configu 

=  ( 

55 

polypeptides 

disordered  states 

62 

A-chain  not  expected 

59,61 

clupein,  A-chain 

ti 

8 

detect 

9 

structure 

56 

view: 

l58 

a  protein 

has  unfolding  of  hyd 
bonded  structure 

60 

clupein 

has  folded  molecular 
unstable 

63 

hoped : 

64 

A-chain 

as  a-he!ix 

77 

71-6  suggest 

78 

A-chain 

is  largely  unfold< 

66 

Instead  of  64 
found 

67-9 

clupein,  A-chain 
not  insulin 

in  specific  rotatio 

70 

striking  fact  1 

71-6 

clupein,  A-chain,  as 
against 

have  specific  rotati 

11 

result       J 

ordinary  proteins, 
insulin 

79 

71-6  agree  with  finding 

_;:o-! 

A-chain,  not  insulin 

have  peptide  hydrogen 

L 

K 

dependent  on 

wavelength 

temperature 

PH 

denaturation 

experimental  conditions 

not  dependent  on 

moderate  changes  in  protein 

concentration 

A,  Ac 

(of  theoretical  significa 

nee;  here  empirically  determined) 

A 

dependent  on 

temperature 

pH 
denaturation 
ionic  strength 

etc. 

Ac 

only  dependent  on 

drastic  changes  in  protein  system 

titration  to  pH 

10.5  and  12.0 

denaturation  by  urea  or  guanidine 

not  dependent  on 

temperature 

=  A(-Ac^ 

+- 

A^)-i-wavelength 

—    A20  (-/g" 

+ 

^i>y' 

and  of  Ac 

under 

conditions 

(f  Ac  (Group  A) 

under 

conditions. . .  Table  1 

tf  Ac  (Group  B) 

under 

conditions. . . 

in  Group  A 

in  Group  B 

:asured  by  Ac 

.ve  Ac>24(X)A 

=  Group  A) 

veAc<2300A 

=  Group  B) 

in  Group  B 

changes  in 

solution 

led-to  by 

heat 
chemical  denaturation 

made  by 

internal  repulsion  due  to  high 
positive  charge 

would  exist  in 

solution 

in 

aqueous  solution 

(have 

resemblance) 

unaffected,  as  against  1 

undergo  changes  of    • 

100%  to  300%,  in    J 

strong  solutions  of  urea  and 

guanidine  chloride 

exchanging  readily  with 

D,  O 

54 


A  TECHNICAL  ARTICLE :  REDUCTION  TO  TABLES 


Table  3  no  longer  contains  the  full  material  of  the  article,  but 
shows  its  general  structure:  The  first  two  sections  state  that  the 
rotations  of  proteins,  and  their  intermediate  factors  A,  Xc,  depend 
on  a  set  of  experimental  conditions;  the  third  sections  gives  a 
formula  and  table  connecting  all  of  these;  the  fourth  and  fifth 
sections  talk  about  particular  proteins  in  contrast  to  others,  with 
structure  and  other  interpretive  words  appearing  instead  of  measur- 
able rotations ;  and  the  connectors  provide  them  with  an  argument 
development. 

We  return  now  to  the  complete  compact  table  (Table  2)  to  see 
how  the  full  information  can  be  exhibited  in  organized  form. 
Table  4  shows  the  first  two  sections  above,  and  Table  5  the  last  two. 

The  first  two  sections  can  be  combined  into  a  correlation  of 
R  o/h,  or  L,  as  affected  (+)  or  not  ( — )  by  various  k: 

TABLE  4 


\^            K 
HR,L    \^ 

protein 
concen- 
tration 

wave- 
length 

temp- 
era- 
ture 

pH 

denaturation 

ionic 
strength 

only 
drastic 
changes 

only 
iO.5 
-12 

only 

urea 

guanidire 

R  of  proteins 
R  of  insulin 

- 

- 

^ 

'- 

A 

- 

.u 

+ 

/. 

— 

- 

+        ■ 

The  third  section  contains  the  formula  and  table  of  data,  and 
takes  no  further  tabulation. 

The  fifth  section  can  be  tabulated  as  a  correlation  of  clupein, 
the  A-chain,  and  insulin  (h)  against  a  connected  (C)  string  of 
properties  (r  k).  Since  the  fourth  section  simply  distinguishes 
Groups  A  and  B,  which  are  in  turn  a  property  of  clupein  and  the 
A-chain,  we  can  include  this  section  in  the  tabulation. 


instead  of: 

hoped  (66) 

expected  (62) 

(61,64-5) 

c.         >< 
c 

already 

un- 
folded 

hence  (77) 
(60,  78) 

o  .5 

c 
o 

and: 
found  in 

(49-61) 

group  B 

not  native 
globular  — 

with  dis- 
ordered 
states 

1 

and: 
rotations 

(67-9) 

IS 

£ 

60 

;"   c 

C    CO 

agreeing  witii : 

peptide 

hydrogen  atoms, 

in  respect  to 

DoO  (80-1) 

1) 

60 

c 

CO 

x: 
o 

X 

C 
CO 

x: 

X 

4> 

'c 

o 

bui  f70)  in 

strong  solutions 

of  urea, 

guanidine  the 

rotations  are 

(71-6) 

I 
o 
o 

T 

changed 

100%  to 

300% 

since  (56) 
in  dcnaturation 
hydrogen  bonded 
structure  is  (58) 

1 

<u 

.o 
c 

1 

c. 

o  .E 
<   o 

insulin 

(except  A- 

chain) 

c    o 

'"3     O 
O     C. 

-  s 

56  A  TECHNICAL  ARTICLE:  REDUCTION  TO  TABLES 

4.  DISCUSSION 

We  see,  then,  that  the  article  can  be  summarized  into  Tables  4,  5 
plus  the  formula  and  data-table.  There  are  obvious  similarities 
and  differences  between  the  structures  of  Tables  4  and  5.  Each 
table  permits  a  critique  of  the  section  it  summarizes.  In  Table  4, 
we  see  that  the  difference  in  status  of  the  various  k  was  not  indi- 
cated, and  many  slots  are  left  unspecified.  In  Table  5  we  see  that 
the  line  of  argument  (C  operating  on  r  k,  for  each  h)  fails  to  come 
out  clearly. 

The  methods  used  in  this  analysis  were:  transformations  (which 
gave  us  the  optimal  transform  in  Table  1,  and  the  later  reduction 
of  many  metadiscourse  periods  into  C  of  Table  2);  comparison  of 
the  periods  (which  suggested  which  periods  should  be  grouped 
together  for  Table  2,  which  metadiscourse  periods  could  be  made 
into  C,  what  were  the  main  shifts  within  each  column  in  Table  3, 
and  how  the  column  entries  of  one  column  varied  with  respect  to 
another  as  in  Tables  4,  5);  and  knowledge  of  the  less  specialized 
word  meanings  (which  was  used  in  checking  that  the  reordering  of 
some  periods  in  Table  2  did  not  introduce  meaning  changes,  in 
deciding  the  relevant  synonymity  of  some  words  for  the  coalescing 
of  periods  in  Table  2,  and  in  summarizing  the  covariation  of 
columns  for  Tables  4,  5). 

It  turned  out  that  collecting  (reordering)  similar  periods  made 
the  structure  of  the  article  clearer,  and  that  each  section  of  the 
article  -  general  statements  of  fact,  data,  and  argument  -  consisted 
of  a  set  (not  in  all  cases  ordered)  of  similar  periods.  The  different 
sections  of  the  article  differed  either  in  the  classes  which  the  periods 
contained,  or  in  the  major  entries  in  the  classes  (e.g.  the  critical 
and  final  h  r  k  sections,  in  Table  3). 

Analysis  of  other  articles,  and  theoretical  study  of  the  language 
of  science  and  of  connected  scientific  statement,  should  extend  and 
simplify  the  procedure  for  obtaining  compact  and  reduced  tables 
out  of  the  table  of  periods  of  an  article,  and  may  yield  an  inspectable 
method  of  scientific  statement,  somewhat  along  the  lines  of  Tables 
4,  5.   The  formula  and  data-table  were  already  at  this  stage. 


APPENDIX 
DISCOURSE  ANALYSIS  OF  A  STORY 


The  Very  Proper  Gander^ 

by 
James  Thurber 


Y  w  G 

1.  Not  SO  very  long  ago°  |  there  was  ||^  a  very  fine  gander. 

G  =W  =\V  -W  G 

2.  He  s||  was  strong  [  and  smooth]  [and  beautiful]  [and  he  ^|| 

Y  =w  Y 

spent  most  of  his  time  singing  j^'^  to  his  wife  and  children]. 
Y  H  H  u  G  =w 

3.  One  day  ^|  somebody  ^[v^ho  ^\\  saw  \\^  him  ^j  strutting  |^'^  up 

=w  =u  w 

and  down  in  his  yard  and  singing]  ||  remarked,  jj^  "There  is  |^ 

Y  G 

a  very  proper  gander." 
5       H  =u  GW  u'  Y 

4.  An  old  hen  s|j  overheard  ||o  this  [and  told  [^^  her  husband 

HUGW  y 

||o  about  it  I™  that  night  in  the  roost]. 

P  =  H       P  =  U  =GW  H  8U' 

5.  "They  s|  said  |o  something  about  propaganda",  °||  she  ^||  said. 

^  Reprinted  with  additional  textual  markings  from  James  Thurber,  Fables  for 
Our  Time  (New  York:  Harper,  1940),  p.  17.  This  is  the  complete  story  except 
for  the  moral,  appended  at  the  end  of  the  story  in  italics  (in  the  original),  which 
reads:  Moral:  Anybody  who  you  or  your  wife  thinks  is  going  to  overthrow  the 
government  by  violence  must  be  driven  out  of  the  country. 


58       •  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

P  =  H  P=  U  P  =  GW  U  H 

6.  "I  ^1  have  always  suspected  |o  that",  ^H  said  |p  the  rooster 

H  u 

[and  he  ^11  went  around  the  barnyard  next  day  telHng  every- 

G  =  w 

body  11*^  that  the  very  fine  gander  ^|  was  a  dangerous  bird, 

=  w 
[more  than  likely  a  hawk  in  gander's  clothing]]. 

Y  H  t:  Y 

7.  A  small  brown  hen  ^\\  remembered  |j°  a  time  when  at  a  great 

H  Y    u  G 

distance  ^^|  she  ^\  had  seen  |o  the  gander  ^   talking  to  some 
hawks  in  the  forest. 
P=G  p=w  H  V 

8.  "They  ^|  were  up  to  no  good",  o||  she  ^||  said. 

H  u  G  =  w 

9.  A  duck  ^Ij  remembered  \\^  that  the  gander  s|  had  once  told 
him  that  he  did  not  believe  in  anything. 

G  =  w  u  H 

10.  "He  s|  said  to  hell  with  the  flag,  too",  o||  said  |p  the  duck. 

Y  H  u  H  Y  i^ 

11.  A  guinea  hen  ^\\  recalled  \\^  that  she  s|  had  once  seen  |o  some- 

Y  G  =  w 

body  who  looked  very  much  like  the  gander  ^i  throwsome- 
thing  that  looked  a  great  deal  like  a  bomb. 

H  =  YG 

12.  Finally  ^|  everyone  ^1|  snatched  up  sticks  and  stones  [and 

=  Y  =  G 

descended  on  |o  the  gander's  house]. 
G  w  Y  w 

13.  He  s||  was  strutting  j^^  in  his  front  yard,  [singing  |p^  to  his 

Y 
children  and  his  wife]. 

G        w  =  H  u 

14.  "There  ^\  he  ^\  is"!  ^\\  everybody  s|j  cried. 

=  W     =  G 

15.  "Hawk-lover!" 

=■  W       G 

16.  "Unbeliever!" 


appendix:  discourse  analysis  of  a  story 


59 


=  W      G 

17.  "Flag-hater!" 

=  w  G 

18.  "Bomb-thrower!" 

H  Y  G  -  Y  G 

19.  So  D|  they  ^\\  set  upon  ||o  him  [and  drove  |o  him  \^^  out  of 
the  country]. 

The  marks  on  the  text  indicate  the  complete  analysis  of  the  text, 
except  for  some  details  and  uncertainties  (which  will  be  included  in 
the  discussion  below).  Analysis  of  this  text  requires  only  the  most 
common  grammatical  relations  and  transformations. 

To  apply  the  equivalence  operations  (a,  p,  y  below)  we  need  to 
know  the  grammatical  relations  among  the  parts  to  be  equated. 
Hence  every  sentence  is  analyzed  into  constituents  (with  their 
grammatical  relations  noted),  as  far  down  as  is  required  for  the 
equivalence  operations.  If  one  sentence  has  a  greater  hierarchy  of 
constituents  than  the  other  sentences,  the  extra  structure  need  not 
be  analyzed,  since  the  equivalence  operations  (which  match  struc- 
tures) will  make  no  use  of  it.  Since  the  sentences  are  the  constitu- 
ents of  the  text,  with  no  grammatical  relation  among  them,  this 
covers  the  text.  We  use  the  following  notation 


D 


m 


FN 


to  indicate  that  n  is  the  subject  S  of  the  verb,  p,  and  that  q  is  its 
object  O  (the  verb,  p,  remains  unmarked);  m  is  an  adverbial  phrase 
D;  r  is  a  preposition  plus  noun  phrase  PN  (either  included  in  the 
object  or  else  constituting  an  indirect  object  of  the  verb). 
The  following 

OS  OS 

v      w       x     y 

indicates  that  v  w  x  y  is  the  object  of  the  verb  u ;  but  that  within  this 
object  phrase  v  is  the  subject  of  the  verb  w  and  x  y  is  its  object;  and 
within  the  x  y  phrase  x  is  the  subject  of  the  verb  y : 


They 


saw 


o      s 

us 


take 


It. 


60  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

[  ]  incloses  independent  sentence  structures  which  are  connected 
by  and  or  wh-.  The  sentence  structure  v  w  x  y  above  is  not  inde- 
pendent, since  it  constitutes  the  object  of  a  u;  and  its  morphology 
differs  from  that  of  independent  sentences  {us  take  it  instead  of 
we  took  it),  although  its  constituents  relate  to  each  other  as  do  the 
constituents  of  independent  sentences. 


1.  GRAMMATICAL  TRANSFORMATIONS 

In  addition  to  stating  the  constituents  and  their  relations,  we  make 
use  of  a  few  grammatical  transformations,  primarily  about  pro- 
nouns (equivalent  to  particular  nouns  in  their  neighborhood)  and 
conjunctions  (equivalent  to  an  array  of  partially-identical  senten- 
ces). 

a.  Bound  Pronouns.  It  is  possible  to  discuss  what  a  pronoun  is 
equivalent  to,  within  a  text,  if  we  recognize  the  distributional  rela- 
tion of  the  pronoun  to  a  particular  noun  in  its  neighborhood.  In 
The  fellow  who  wrote  it  just  left  and  The  books  of  poetry  which  are 
left  have  been  reduced  in  price,  we  consider  -a  and  -ich  pronouns, 
and  wh-  a  connective  (relative)  between  the  included  sentence 
structure  (e.g.  -o  wrote  it)  and  the  encircling  sentence  {The  fellow .. . 
left).  Several  conditions  apply  to  these  pronouns:  First,  there  is 
always  a  noun  phrase  preceding  them  {fellow,  books  of  poetry). 
That  is  to  say,  they  always  leave  a  noun  phrase  in  a  particular 
syntactic  position  in  respect  to  them.  Other  pronouns  have  a  noun 
phrase  occupying  some  other  position  in  respect  to  the  pronoun. 
We  will  call  this  "F  position  with  respect  to  the  pronoun".  For 
-o,  -ich,  the  F  position  is  the  preceding  noun-phrase  position.  We 
have  here  a  restriction  of  distribution:  the  pronoun  never  occurs 
without  its  F-position  noun,  though  the  noun  often  occurs  without 
the  pronoun.  Second,  the  choice  of -o  or  -ich,  depends  on  the  choice 
of  noun  in  the  F  position;  so  that  we  have  an  agreement  relation 
between  the  pronoun  and  the  F-position  noun.  Third,  the  gram- 
matical agreements  of  the  pronoun  in  the  included  sentence  are 
the  same  as  those  of  the  F-noun  in  the  encircling  sentence.   {The 


appendix:  discoursh  analysis  oj  a  story  61 

hook  which  was  left  was  reduced  in  price,  but  I  he  hooks  which  were 
left  were  reduced  ir^  price.)  Finally,  the  selection  of  particular  verbs, 
etc.,  which  occur  with  the  pronoun  is  the  same  as  the  selection  of 
verbs,  etc.,  which  occur  with  the  F-noun :  compare  all  the  verbs  wiiich 
would  occur  (in  some  very  large  sample  of  English)  in  The  book 
which"  {fell,  was  left,  was  reduced,  etc.)  with  all  the  verbs  that 
would  occur  in  The  book— {fell,  was  left,  was  reduced,  etc.).  Clearly, 
then,  the  pronoun  is  not  distributionally  independent  of  its  F-noun. 
Its  dependence  upon  the  F-noun  can  be  expressed  (after  the  manner 
of  long  components)  by  saying  that  the  pronoun  carries  over  (or 
continues  cr  repeats)  the  F-noun  into  the  pronoun  position,  or  that 
the  pronoun  is  equivalent  to  a  reoccurrence  of  its  F-noun  in  the 
pronoun  position. 

b.  Loose  Pronouns.  Certain  other  pronouns  have  the  same 
general  relations  as  above,  except  that  there  is  more  than  one 
position  that  can  function  as  F  position  with  respect  to  them.  In 
When  the  ball  hit  the  window  it...  the  F  position  for  it  is  either  the 
subject  or  the  object  of  the  preceding  clause.  Usually,  other 
occurrences  of  these  nouns  in  the  preceding  sentences  will  increase 
the  probabihty  that  one  or  the  other  is  the  F-noun  of  //  in  this 
particular  sentence.  In  any  case  it  will  usually  be  equivalent  (in 
the  sense  above)  to  one  or  the  other  of  these  nouns:  for  the  verb 
after  it  in  the  above  sentence  will  usually  be  one  of  the  verbs  that 
may  occur  after  The  ball--  or  one  of  the  verbs  that  may  occur  after 
The  window-  (either  bounced  off,  rolled  away,  crashed  right  through. 
etc.  or  else  shattered,  cracked,  broke,  wasn't  even  damaged,  etc.). 
Sometimes,  however,  these  pronouns  have  no  F-position  noun. 
Consider,  for  example.  When  the  ball  hit  the  window,  it  got  scared 
and  flew  away.  Here  some  noun  (e.g.  bird)  which  occurs  w'ilhflew 
away  may  have  appeared  somewhere  in  the  preceding  sentences,  or 
may  not  have  appeared  at  all.  These  pronouns  can  therefore  occur 
without  the  dependence  on  an  F-position  noun.  But  far  more 
frequently  they  occur  with  an  F-position  noun;  and  often  it  is 
possible  to  decide  which  of  two  possible  positions  has  the  F-noun, 
e.g.,  by  comparing  the  verb  of  the  pronoun  with  the  verb  selections 
of  the  two  F  candidates.  The  result  is  a  probability  statement  based 


62  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

on  distributional  data,  not  an  absolute  statement  based  on  semantic 
grounds.  Given  When  the  ball  hit  the  window,  it  bounced  off,  we 
say  that  the  F-noun  of  it  in  this  sentence  is  probably  ball,  with  the 
relative  probability  that  bounced  Oj^has  for  occurring  as  the  predi- 
cate of  the  ball  rather  than  as  the  predicate  of  the  window;  but  there 
remains  a  much  smaller  probability  of  the  F-noun  being  the  window, 
or  of  neither  of  these  being  the  F-noun.  We  can  now  say  that  it  in 
this  sentence  has  this  probability  of  equivalence  to  the  ball. 

c.  Parallel  Constituents.  All  sentences  and  included  con- 
stituents which  contain  a  conjunction  (and,  comma,  etc.)  are 
equivalent  to  two  sentences  as  follows:  if  Aj  and  Ag  are  two 
stretches  having  identical  syntactic  status,  with  a  conjunction  C 
between  them,  and  if  Z  is  the  rest  of  the  constituent  or  sentence  in 
which  they  occur,  then  Aj  C  Ag  Z  is  equivalent  to  Aj  Z  C  Ag  Z. 
For  example:  He  was  strutting  in  his  front  yard,  singing  to  his  child- 
ren and  his  wife  is  a  transform  of  He  was  strutting  in  his  front  yard, 
he  was  singing  to  his  children  and  his  wife;  and  the  second  half  is 
further  a  transform  of  He  was  singing  to  his  children  and  he  was 
singing  to  his  wife.  The  chief  grounds  for  this  are:  First,  whatever 
the  syntactic  relation  of  Ai  C  Ag  is  to  Z  in  an  Aj  C  Ag  Z  constituent 
or  sentence,  is  also  the  syntactic  relation  of  A^  to  Z  in  a  constituent 
or  sentence  consisting  just  of  A^  Z,  and  it  is  the  syntactic  relation 
of  Aa  to  Z  in  a  constituent  or  sentence  consisting  just  of  Ag  Z. 
Thus  in  He  was  singing  to  his  children  and  his  wife,  the  phrase  to  his 
children  and  his  wife  is  the  indirect  object  of  He  was  singing.  But 
if  we  form  the  shorter  sentence  He  was  singing  to  his  children,  then 
to  his  children  is  the  indirect  object  of  He  was  singing;  and  if  we 
form  the  sentence  He  was  singing  to  his  wife,  then  to  his  wife  is  the 
indirect  object  of  He  was  singing.  Thus  A^  C  Ag,  A^  alone,  and  Ag 
alone,  all  have  the  same  relation  to  Z  in  the  corresponding  senten- 
ces. Second,  the  selection  of  Z  which  occurs  with  A^  alone,  and 
with  Ag  alone,  is  in  general  the  same  as  the  selection  of  Z  for  Aj  C 
Ag:  compare  the  subject- verb  sequences  that  occur  (in  a  large 

sample  of  English)  in  place  of —  in  to  his  children 

and  his  wife  with  those  that  occur  in  the  same  place  in  - to  his 

children  or  in  to  his  wife.    The  expansion  of  a  constituent 


appendix:  discourse  analysis  of  a  story  63 

into  two  parallel  constituents  connected  by  a  conjunction  can  only 
be  done  within  the  boundaries  of  that  constituent  alone.  For 
example.  He  spent  some  of  his  time  singinj^  to  his  wife  and  children 
can  not  be  transformed  into  He  spent  some  of  his  time  sin^^ing  to  his 
wife  and  he  spent  some  of  his  time  singing  to  his  children.  But  it  can 
be  transformed  into  He  spent  some  of  his  time  singing  to  his  wife 
and  to  his  children.  So  that  to  his  wife  and  children  is  equivalent  to 
to  his  wife  and  to  his  children.  Expansions  beyond  the  original 
constituent  are  also  possible,  but  under  particular  conditions. 
There  are  also  special  conditions  which  apply  to  the  various  con- 
junctions, and  to  particular  intonational  or  constituent  formation, 
such  as  He  went  and  he  told.  But  these  details  are  not  important  for 
the  present  text. 

d.  Parallel  Quotes.  Sentence  sequences  of  the  form  P  ''Q. 
R"  are  transforms  of  P  "Q".  P  "R".  For  example.  He  said  "/  was 
there.  But  I  didn't  see  it"  is  equivalent  to  He  said  "/  was  there.'' 
He  said  '''But  I  didn't  see  it".  The  grounds  are  as  in  c  above.  The 
quoted  material  is  the  object  of  the  preceding  subject + verb,  and 
we  can  take  each  grammatically  parallel  portion  of  it  as  being 
independently  the  object  of  that  subject +verb. 

e.  Separable  Clauses:  [  ]  in  the  text.  Two  independent 
sentence  structures  which  are  connected  by  and  or  comma  can  be 
replaced  by  the  mere  succession  of  those  two  sentences,  without 
and,  with  a  period  after  each.  This  is  only  a  convenience  for 
writing  the  analyzed  sentences  in  an  array  of  intervals  each  with 
its  own  period.  It  will  not  be  justified  here  and  can  be  disregarded. 


2.  DISCOURSE  OPERATIONS 

a.  Same  Relation  to  Same  Environment.  Two  occurrences  of 
the  same  morpheme  are  tentatively  set  equivalent  =o  to  each  other. 
If  the  environments  of  these  two  (which  we  identify  partly  on  the 
basis  of  these  tentative  equivalences)  turn  out  to  be  identical  in 
some  considerable  degree,  then  the  tentative  equivalence  is  made 
definite.  If  two  morphemes  or  sequences  are  =o  and  are  equivalent 


64  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

by  virtue  of  the  grammatical  operations  above,  they  will  be  called 
"same".  Two  morphemes  or  sequences  which  occur  in  the  same 
grammatical  relation  to  the  same  morphemes  are  equivalent  =i 
to  each  other.  (In  general,  if  in  the  sentences  A  Z.,  B  Y.,  sequence 
A  has  a  given  grammatical  relation  to  sequence  Z,  and  sequence  B 
has  the  same  relation  to  Y,  and  if  Z  =n  Y,  then  A  =n+i  B.)  AU 
sequences  which  are  same  or  equivalent  to  each  other  are  grouped 
into  one  equivalence  class.  In  this  text  the  classes  are  marked  G, 
W,  H,  U,  Y.  A  sequence  which  is  =  i  to  some  member  of,  say,  G 
is  marked  =  G.  (This  text  does  not  go  beyond  =i.)  A  sequence 
which  is  same  with  (or  pronoun  of)  some  member  of  G  is  simply 
marked  G.  Roughly,  we  may  say  that  this  operation  puts  into  one 
class  those  sequences  whose  environments  are  at  least  once  in  the 
same  class,  i.e.,  the  sequences  which  at  least  once  have  same  or 
equivalent  environments.  This  operation  is  not  so  powerful  as  to 
impose  a  structure  upon  any  text,  since  A  could  have  the  same 
environment  as  B  in  one  place,  but  elsewhere  have  environments 
that  are  known  otherwise  to  be  unequivalent.  E.g.,  if  in  a  text 
Z  =  1  Y.  and  A  Z  and  B  Y  occur,  then  A  =  2  B.  If  A  B  also  occurs, 
then  Z  =  1  B,  and  all  the  sequences  become  members  of  the  same 
class  Y  =  Z  =  B  =  A,  and  if  those  components  exhaust  the  sen- 
tences every  sentence  is  a  succession  of  the  same  class.  It  is  possible 
to  devise  texts  in  which  such  collapsings  occur  (since  "unequi- 
valence"  is  not  defined,  the  analysis  of  such  texts  leads  to  a  col- 
lapsing of  all  the  distinctions  rather  than  to  contradictions).  How- 
ever, most  texts  do  not  lead  to  such  collapsings  under  this  opera- 
tion; and  various  types  of  texts  differ  in  various  ways  from  a  col- 
lapsed result.  (See  1  and  2  in  the  comments  below.) 

p.  Same  Position  in  Same  Constituent.  If  D  E  =  F  G,  and  if 
the  grammatical  relation  of  D  to  E  is  the  same  as  that  of  F  to  G, 
then  D  =  F  and  E  =  G.  Whereas  a  equated  those  morphemes 
that  had  the  same  relations  to  the  same  environments,  [3  equates 
those  morphemes  that  have  the  same  relations  to  different  environ- 
ments but  where  the  two  stretches  of  morpheme  plus  environment 
are  known  to  be  equivalent.  (See  5  and  10  in  the  comments  below.) 

y.  Irrelevant  Modifiers,   if  a  m.orpheme  or  sequence  J  which 


appendix:  dfscourse  analysis  of  a  story  65 

appears  in  the  analysis  occurs  in  some  places  as  the  grammatical 
head  J  of  a  constituent  K,  and  if  the  rest  of  K  (the  portions  which 
occur  with  the  J  head,  as  its  grammatical  "modifiers")  occur 
nowhere  else  in  the  text,  or  occur  only  in  the  same  relation  again 
to  a  J  head,  then  J  =o  K..  We  set  this  equivalence  because  the  J 
head  of  K  is  already  =o  J,  and  the  remainder  of  K,  which  relates 
only  to  the  head,  cannot  affect  the  relation  of  the  J  head  to  other 
parts  of  the  sentence;  furthermore,  since  the  modifying  morphemes 
occur  nowhere  else  in  the  text,  they  cannot  indirectly  involve  K  in 
any  equivalences  in  which  J,  which  lacks  those  morphemes,  would 
not  be  involved.  As  a  consequence,  the  relation  of  a  given  K  to 
anything  in  the  text  is  not  different  from  what  it  would  be  if  we  had 
a  J  there  in  the  place  of  K.   (See  1  and  3,  or  2  and  3  below.) 

There  are  important  restrictions  upon  the  application  of  y,  of 
the  same  order  as  for  the  first  two  sentences  of  a  (concerning  =o). 
For  example,  two  given  modifiers  of  J  may  correlate  with  two 
different  environments  of  J  without  having  any  morphemes  in 
common.  Although  many  specific  restrictions  can  be  set  up,  to 
indicate  when  y  may  not  be  applied,  the  final  test  is  whether  the 
use  of  y  in  a  particular  case  contributes  ultimately  toward  a  collapse 
of  some  large  part  of  the  whole  text  analysis.  Such  an  effect  may 
appear  not  directly,  but  only  after  certain  other  equivalences  have 
been  applied.  The  only  example  in  the  present  text  is  the  some- 
body... something...  of  11.  Every  apphcation  of  y  is  therefore 
tentative;  the  alternative  is  always  to  leave  K  non-equivalent  to  J. 

S.  Semantic  Assumptions.  Every  so  often  we  reach  a  point  in 
a  text  where  the  preceding  operations  do  not  suffice  to  enable  the 
analysis  to  proceed.  We  therefore  make  a  semantic  extension  to 
the  preceding  distributional  operations:  If  a  morpheme  or  sequence 
M  has  roughly  the  same  semantic  relation  to  its  environment  as 
another  sequence  N  has  to  its  semantically  equivalent  environment, 
we  may  include  M  semantically  in  the  class  of  N.  This  operation, 
designed  as  an  analog  to  a,  is  based  on  semantic  knowledge  and 
judgment.  We  try  to  apply  it  sparingly,  in  a  way  that  leads  to 
fewest  apphcations  of  S  and  most  applications  of  a.  We  also  try 
to  use  it  where  the  environments  of  M  and  N  are  as  similar  as 


66  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

possible  in  terms  of  the  preceding  operations;  so  that  if  the  text 
had  had  one  or  two  additional  sentences  of  the  kind  it  already  has, 
it  might  have  been  possible  to  derive  our  M  =  N  equivalence 
formally  by  the  preceding  operation.  In  this  way  the  semantic 
assumption  merely  bridges  a  small  gap  in  a  formal  structure.  At 
the  same  time  we  try  to  use  it  where  M  and  N  seem  intuitively  to 
fit  the  same  class,  as  these  classes  have  been  taking  shape  on  formal 
grounds;  we  do  not  want  to  use  it  to  establish  equivalences  which 
the  text  does  not  obviously  suggest,  and  which  depend  more  on 
our  own  opinions.  In  this  way  the  less  obvious  equivalences  in  a 
text  come  out  by  the  formal  operations,  with  the  aid  of  relatively 
simpler  semantic  assumptions  made  at  other  points  in  the  text. 

These  grammatical  and  discourse  operations  (with  the  exception 
of  S)  are  applied  whenever  the  conditions  for  them  exist  in  the  text, 
not  at  the  discretion  of  the  analyst.  In  some  cases  several  alterna- 
tive paths  open  up.  No  matter  which  is  followed  we  would  have  an 
analysis  of  the  text.  The  analyst  can  make  explicit  choices  among 
such  alternatives  in  order  to  select  a  more  fruitful  analysis.  He  can 
also  search  for  equivalences  that  are  not  obvious,  for  points  where 
a  discourse  equivalence  can  be  applied  only  after  a  particular 
succession  of  grammatical  operations  prepares  the  way  for  it.  And 
he  decides  what  assumptions  of  semantic  equivalence  to  make. 


3.  APPLICATION  TO  THE  TEXT 

To  see  how  these  operations  were  apphed  to  our  text  we  interpret 
the  marks  written  above  the  text. 

1.  The  subject  and  verb-phrase  are  arbitrarily  marked  G  and  W. 
The  adverbial  phrase  ''modifies"  the  verb  and  occurs  only  here 
(irrelevant  modifier  y);  similarly  for  One  day  in  3,  etc. 

2.  The  two  full  sentence  structures  are  separated  by  e  (replacing 
the  and  by  period).  The  first  one  is  filled  out  into  three  parallel 
sentences  by  repeating  their  He  was  (by  e),  and  these  too  are 
separated  by  e.  We  obtain  three  "intervals",  i.e.,  textually  parallel 
stretches  which  (h'ke  sentences)  have  no  systematic  grammatical 


APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY  67 

dependences  to  anything  outside  their  boundaries:  He  was  strong. 
He  was  smooth.  He  was  beautiful.  All  of  these  have  the  subject  he, 
which  repeats  gander  (pronoun  b).  Since  sentence  1  consisted  of 
the  subject  gander  plus  the  predicate  there  was  (aside  from  the 
adverbial  phrase),  we  derive  was  strong  —i  there  was,  and  so  for 
the  other  predicates;  all  are  W.  This  also  applies  to  the  predicate 
. .  .singing. . .  of  the  second  half  of  2. 

It  is  possible  to  break  this  up  into  two  intervals,  He  spent.  .  . 
and  He  sang. .  .  by  a  new  statement,  f,  that  verb  +  verb  ing  is 
grammatically  equivalent  to  two  predicates.  But  it  is  also  possible 
to  show  in  terms  of  selection  (e.g.  He  starts  singing,  but  not  He 
sings  starting.)  that  in  some  cases  the  first  verb  modifies  the  second. 

Since  his  could  be  connected  with  he,  and  hence  G,  we  could  get 
two  intervals  out  of  the  spent.  . .  phrase,  leading  to  something  hke 
G  spent  most. . .  time.  G  had  time.  To  do  this,  we  would  use  a  new 
discourse  operation:  If  CD  is  a  grammatically  unremovable  part 
of  ABCD,  but  at  the  same  time  has  the  structure  of  an  interval  in 
its  own  right,  we  set  up  two  intervals:  A  +  B  +  (CD).  C  f-  D. 
This  can  be  used  again  in  3,  7,  9  (e.g.  7 :  She  saw  the  gander.  The 
gander  talked. . .).  Alternatively  we  can  say  that  he  in  an  adverbial 
phrase  modifying  W  is  the  same  morpheme  but  with  different 
textual  status  than  he  =  G,  as  we  might  say  that  look  in  noun 
position  is  the  same  morpheme  as  look  in  verb  position  but  with 
different  syntactic  status.  Then  G  becomes,  in  accordance  with  a, 
not  all  equivalents  of  gander,  but  only  those  ganders  which  are 
subject  of  W.   A  similar  discussion  would  apply  to  her  in  4.  etc. 

3.  We  have  two  sentence  structures:  somebody  remarked  '*  — 
and  -o  saw  him .  .  . ,  connected  by  wh-.  Since  -o  repeats  somebody 
(bound  pronoun  a),  we  put  both  into  a  class  H.  For  convenience, 
we  mark  the  verb  saw  with  U.  Its  object  is  him  strutting.  .  .  and 
singing.  By  building  up  parallel  clauses  and  separating  them  (c 
and  e)  we  get  HU  him  strutting.  .  . ,  plus  HU  him  singing.  Here  him 
(which  equals  he  +  object  element)  repeats  the  preceding  G  (loose 
pronoun  b);  and  within  the  object  the  he  that  is  included  in  him  is 
the  subject  oi  singing,  which  has  appeared  before  as  IV,  the  predi- 
cate of  G.    Hence  we  have  HU  (GW),  with  {GIV)  as  the  object  of 


68  APPENDIX :  DISCOURSE  ANALYSIS  OF  A  STORY 

HU;  and  with  G  as  the  subject  of  W,  as  it  is  when  GH^  occurs  alone. 
Thus  HU  him  strutting  is  HUG  strutting,  hence  strutting  =  ^  singing, 
and  is  also  W. 

The  encircling  sentence  is  H  remarked  ''There  is  a  very  proper 
gander^\  Allowing  for  proper  in  place  of  fine  (irrelevant  modifier 
y),  the  subject  in  the  quotes  is  G;  the  object  within  the  quotes  = 
there  was  if  we  don't  count  the  difference  in  stress  (again  by  y) 
and  also  differences  of  tense  (y)  (though  it  may  be  hard  to  phrase 
this  in  terms  of  y,  and  the  semantic  S  may  be  needed).  Then  we 
have  H  remarked  ''GW',  with  GW  as  object  of  H  remarked:  hence 
remarked  =i  saw  and  is  in  U. 

4.  An  old  hen  is  assumed  to  be  //(semantic  S);  grounds  are  that 
7  has  she  (the  hen)  had  seen  the  gander  talking,  parallel  to  H  saw 
him  {the  gander)  singing  in  3.  This  is  the  object  of  //  +  verb,  and 
repeats  the  object  of  the  preceding  H  +  verb  which  was  GW. 
Hence  overheard  is  U.  In  the  second  half  the  subject  is  H  (building 
up  parallels,  c).  In  order  to  compare  the  predicate  told. . .  about  it 
with  the  verb  -f-  direct  object  of  the  preceding  interval,  we  take 
told.  .  .  about  as  the  verb  phrase  and  it  as  its  direct  object  (and  we 
divide  analogously  elsewhere),  {her  husband  is  an  indirect  object 
PN,  and  would  be  replaced  by  to  her  husband  if  it  came  after  it.) 
Here  the  object  it  repeats  the  whole  HUGW  sentence  preceding 
(loose  pronoun  b);  yielding  H  told  about  {HUGW). 

5.  She  repeats  hen  (b).  Said  is  put  into  the  same  class  U'  as  told 
about  (semantic  S)  because  of  the  parallel  occurrence  of  said  and 
tell  as  predicates  of  rooster  in  6  (in  4  and  5  these  are  both  predicates 
of  hen).  Then  the  quote  is  the  object  of  HU',  as  //  was  the  object 
of  HU'  in  4.  Since  it  was  HUGW,  so  in  the  quote  (same  relation  to 
same  environment,  a).  We  therefore  equate  the  subject  of  the  quote. 
They,  with  the  subject  H,  the  verb  said  with  the  verb  U,  the  object 
something  about  propaganda  with  the  object  GfF  (same  position  in 
same  constituent,  [3).  Since  assuming  said  to  be  U'  led  to  obtaining 
said  as  U  (in  both  cases  with  H  as  subject,  though  with  somewhat 
different  objects),  we  combine  U  and  U'. 

6.  The  rooster  is  put  in  H  by  parallel  occurrence  with  hen  in  5 
(semantic  5).  This  yields  HU  with  a  quote  as  object.  Now  in  3  the 


appendix:  discourse  analysis  of  a  story  69 

quoted  object  of  HU  is  GH^;  in  5  it  is  HU(GfV).  But  in  the  present 
quote,  the  pronoun  /  is  in  //  because  it  repeats  the  rooster  (bound 
pronoun  a),  and  the  object  that  (which  is  contained  within  the 
quoted  object)  is  by  itself  either  GIV  or  I/UG IV  (loose  pronoun  b). 
Therefore  we  take  the  quoted  object  here  to  be  HUGW  (rather 
than  GW).  The  identifications  above  together  with  p  (same  position 
in  same  constituent)  then  yield:  /  in  //,  the  verb  .  .  .suspected in  U, 
the  internal  object  that  as  GW. 

The  second  sentence  has  he  repeating  rooster  {H,  by  b)  as  sub- 
ject, and  .  .  .telling. . .  that  (which  is  U)  as  verb-phrase  (for  the 
division  after  that,  cf.  4).  This  treatment  is  like  that  of  .  .  .singing.. . 
in  2;  alternatively,  we  could  break  this  into  two  intervals  He  went . . . 
and  he  told.  . ..  The  object  contains  G  as  its  internal  subject.  We 
have  therefore  HU  (G  was  a  dangerous  bird);  comparison  with  HU 
(GW)  in  3  puts  was  a  dangerous  bird  into  W.  By  c  and  e,  we  have  a 
parallel  interval  he  went  around.  .  .  telling.  .  .  that  the  gander  was 
more  than  likely  a  hawk .  .  .,  which  puts  was  ...  a  hawk .  .  .  into  W, 
and  analyzes  the  whole  additional  interval  into  HU(GW).  The 
occurrence  of  gander  within  the  predicate  W  is  treated  like  the 
occurrence  of  his  in  2. 

7.  We  put  remembered  into  U  on  the  semantic  grounds  of  the 
other  verbs  following  H  (S).  We  take  a  time  when  as  a  connective 
between  the  major  sentence  and  the  included  sentence  she  had 
seen . . .  which  is  the  object  of  the  major  sentence  (compare  She 
remembered  when  she  had  seen. . .).  Within  this  object,  she  repeats 
hen  (H),  and  had  seen  is  U  (allowing  for  the  tense  as  in  3),  and  the 
gander  is  G  (allowing  for  omission  of  a  very  fine  by  y).  The  whole 
is  then  HU  (HU  (G  talking. . .)),  each  parenthesis  indicating  the 
object  of  what  precedes  it.  Comparison  with  HU  (HU  (GW))  in 
4,  5,  6  puts  talking. . .  in  W. 

8.  The  object  of  HU  is  GW  in  3,  4,  6  and  HU  (GW)  in  5,  6,  7. 
Since  the  quoted  object  here  has  no  object  within  itself,  it  is  easier 
to  analyze  it  as  GW  than  as  HU  (GW).  It  follows  from  ^  (same 
position  in  same  constituent)  that  the  subject  they  is  in  G.  But  by 
a,  b  (pronouns)  the  plural  they  cannot  repeat  the  preceding  singular 
G.   It  can  only  repeat  the  preceding  hawks  or  G  -\-  hawks;  and  in 


70  appendix:  discourse  analysis  of  a  story 

order  to  satisfy  both  conditions  [3  and  a,  b  we  will  take  they  as 
repeating  G  +  hawks,  which  also  satisfies  the  meaning. 

9.  A  duck  is  put  in  H,  by  semantic  parallelism  with  hen  (S).  This 
yields  HU  {G  had  told.  . .);  and  had  told.  .  .  did  not  believe. . .  is 
put  in  W  by  comparison  with  HU  {GW).  The  occurrences  of  told, 
him  and  he  in  W  are  taken  as  his  is  in  2.  Alternative  analyses  into 
two  intervals  are  possible;  the  results,  while  more  complicated, 
give  the  same  general  type  of  interval. 

10.  Said  to  hell. . .  is  Why  comparison  with  other  HU  {GW). 

11.  We  put  recalled  in  U  semantically  by  the  close  parallelism 
with  remembered  in  7  (S).  We  obtain  HU  {HU  {G  throw. . .)).  The 
modifying  somebody  .  .  .like  before  G  is  similar  to  something.  .  . 
like  before  bomb,  so  that  the  use  of  y  here  can  be  questioned. 
However,  the  morphemes  in  the  two  modifiers  are  not  identical. 
Alternatively,  the  two  similar  modifiers  could  be  taken  as  one 
modifier  of  the  whole  G^F  object,  and  so  perhaps  permit  the  use  of 
y.  In  either  case,  once  the  gander  phrase  is  G,  then  throw. . .  '\s  W. 

12.  This  sentence  differs  from  the  others  (and  has  a  semantically 
climactic  position  in  the  text).  Everyone  can  be  put  semantically 
into  H  by  parallehsm  with  everybody  in  14  and  they  in  19.  Then 
snatched. . .  as  its  predicate  would  be  equivalent  to  went  around. . . 
in  6,  if  we  analyze  6  as  he  went. . .  plus  he  told  GW  {scpuvaiQ  inter- 
vals). If  we  divide  the  second  predicate  into  descended  on  the  house 
of  and  the  gander,  we  could  assign  these  parts  to  Y  and  G  respec- 
tively (cf.  19). 

13.  He  is  G  by  the  preceding  gander.  Strutting  in  his  yard  is  W 
with  the  extra  modification /roAi?  here  and  up  and  down  in  3  (y). 

14.  We  put  cried  in  U  by  semantic  parallelism  with  remarked  in  3 
(8).  He  is  G  by  the  preceding  he  (b).  There  is  is  W^  by  3  (any 
difference  in  intonation  being  a  matter  for  y).  Hence  everybody  is  H. 

18.  (15-17  cannot  be  analyzed  without  a  semantic  assumption, 
unless  18  is  analyzed  first).  In  15-18  the  quoted  sections  are  the 
objects  of  everybody  cried  from  14.  Bomb-thrower  is  therefore  the 
object  of  HU.  The  internal  relations  within  the  object  bomb- 
thrower  are  object  bomb  +  verb  throw  +  subject  er.  A  verb-object 
relation  between  throw  and  bomb  has  already  been  met  in  the 


APPENDIX  :  DISCOURSE  ANALYSIS  OF  A  STORY  7  I 

predicate  of  the  C/W^  object  in  II,  where  the  application  ofy  which 
leaves  the  gander  head  of  the  final  subject  phrase  can  be  so  arranged 
as  to  leave  the  bomb  by  the  same  token  head  of  the  final  object 
phrase.  Then  throw  bomb  in  18  =  throw.  .  .  bomb  in  1 1  —  M^.  We 
then  have  in  18  HU  (-er  W)\  and  by  comparison  with  HV  (GPV), 
-er,  which  has  the  subject  relation  to  bomb-throw,  is  G. 

15-17.  We  have  HU  (G  hawk-love),  whence  hawk-love  is  IV. 
Analogously,  unbelieve  -And  flag-hate  are  W. 

19.  They  is  H  repeating  everybody;  and  him  is  G,  repeating  -er 
and  the  preceding  he.  The  identification  is  due  partly  to  the  pro- 
noun agreement  as  to  number  (b,  a).  Set  upon  is  a  new  class  (verb 
of  H,  but  with  G  instead  of  GH^  as  object);  and  drove  out  of  the 
country  is  equivalent  to  it  (a). 

4.  STRUCTURE  OF  THE  TEXT 

In  terms  of  equivalence  classes  we  have  obtained  the  following 
intervals: 


1. 

GW 

6. 

HU  (HU  (GW)) 
HU  (GW) 

13. 

GW 

2. 

GW 
GW 

HU  (GW) 

14. 

HU  (GW) 

GW 

7. 

HU  (HU  (GW)) 

15. 

HU(GW) 

GW 

8. 

HU  (GW) 

16. 

HU  (GW) 

3. 

HU(GW) 

HU(GW) 

9. 

HU  (GW) 

17. 

HU  (GW) 

HU  (GW) 

10.  HU  (GW)  18.  HU  (GW) 

4.  HU(GW) 

HU(HU(GW))     11.  HU(HU(GW))    19.  HYG 

H  YG 

5.  HU  (HU  (GW))      12.  H  snatched 

HYG 


72  appendix:  discourse  analysis  of  a  story 

Certain  major  text-structural  features  are  noticeable :  practically  all 
intervals  contain  GW,  which  can  become  the  object  of  HU,  which 
can  then  become  the  object  of  another  HU;  and  the  succession  in 
1-12  is  repeated  in  a  simpler  form  in  13-19. 

The  semantic  assumptions  were  simple.  They  were  made  in  only 
two  classes:  hen,  rooster,  duck  in  H;  said,  remembered,  recalled, 
cried  in  U.  It  was  preferable  to  make  the  semantic  assumptions  in 
these  classes  because  the  whole  semantic  complexity  of  the  story 
lay  in  G  and  W,  so  that  it  was  important  to  have  G  and  W  estab- 
lished formally.  The  point  of  the  story  takes  on  the  following 
shape  in  terms  of  our  equivalence  classes : 

If  we  write  W  for  those  V/  which  appear  in  the  simple  GW 
intervals,  then  the  HUGW  and  HUHUGW  intervals  contain  only 
W  before  5,  and  never  W  (but  only  other  members  of  W)  after  5 
(except  for  a  W  at  the  start,  in  the  second  interval  of  6).  That  is 
to  say,  before  5  the  members  of  W  when  it  is  associated  with  HU 
are  the  same  as  the  members  of  W  when  it  is  alone;  after  5  the 
members  of  PF  when  associated  with  HU  are  entirely  different  (and 
repeat  among  themselves).  What  happened  in  5,  to  correlate  with 
all  this,  is  the  occurrence  of  something  about  propaganda  as  the 
GW.  This  v^as  assigned  to  GW  on  formal  grounds.  But  in  inter- 
pretation, we  notice  that  propaganda  is  related  to  G  (or  GW)  not 
semantically,  as  are  the  other  members  of  G  or  G  W,  but  phonemi- 
cally.  The  change  in  the  membership  of  W  that  correlates  with 
this  is,  however,  entirely  semantic.  A  phonemic  connection  in  G 
thus  led  to  a  semantic  change  in  W. 

The  analysis  of  this  text  consisted  nierely  in  the  carrying  out  of 
the  stated  operations:  the  grammatical  analysis  of  each  sentence, 
the  five  grammatical  equivalences,  and  the  four  discourse  operations. 
Of  these,  only  the  last  requires  judgment  of  the  meanings  of  mor- 
phemes. Other  texts  require  a  few  additional  grammatical  equi- 
valences, but  rarely  any  further  discourse  operations.  These 
operations  lead  to  a  breakdown  of  the  sentences  into  smaller 
intervals  which  have  the  main  characteristic  of  sentences  -  that 
there  is  no  grammatical  relation  between  one  and  another  (so  that 
all  the  grammatical  relations  arc  concentrated  within  the  interval). 


appendix:  discoursh  analysis  of  a  story  73 

The  relation  that  remains  among  the  intervals  is  that  of  comparing 
them  in  the  order  in  which  they  occur.  The  final  structural  observa- 
tions refer  therefore  either  to  the  succession  of  interval  types,  or 
to  the  successive  members  within  each  equivalence  class  (//,  G, 
etc.)  and  to  the  correlations  among  the  various  successions. 


University  of  CaJifornia  Libraiy 
Los  Angeles 

^^^^^^^^^^^^^^^^1211^}^^^  below. 


PAMPHLET  BINDER 


3   1158  00164  9010 
'         no .  .il 


^^mf^&^^^^m 


